In one of the posts last month, we started taking baby steps in learning Python for data analysis. This post will take you one step ahead in your journey to learn Python. By end of this post, you will understand the role of several python libraries and various kinds of data structures used in Python.

We will take simple examples for each kind of dataset to illustrate their purpose.

**Important libraries in Python:**

Python provides basic set of commands and functionality in its base version. If you need more functions, there are several libraries, which should be imported into your environment. There are several ways of importing libraries in Python:

In the first manner, we have defined an alias pd to library pandas. We can now use various functions from pandas library (e.g. read_csv()) by referencing it using the alias pd.read_csv().

In the second manner, you have imported the entire name space in pandas i.e. you can directly use read_csv() without referring to pandas.

Following are a list of libraries, you will need for any scientific computations and data analysis:

**NumPy**stands for Numerical Python. The most powerful feature of NumPy is n-dimensional array. This library also contains basic linear algebra functions, Fourier transforms, advanced random number capabilities and tools for integration with other low level languages like Fortran, C and C++**SciPy**stands for Scientific Python. SciPy is built on NumPy. It is one of the most useful library for variety of high level science and engineering modules like discrete Fourier transform, Linear Algebra, Optimization and Sparse matrices.**Matplotlib**for plotting vast variety of graphs, starting from histograms to line plots to heat plots.. You can use Pylab feature in ipython notebook (ipython notebook –pylab = inline) to use these plotting features inline. If you ignore the inline option, then pylab converts ipython environment to an environment, very similar to Matlab. You can also use Latex commands to add math to your plot.**Pandas**for structured data operations and manipulations. It is extensively used for data munging and preparation. Pandas were added relatively recently to Python and have been instrumental in boosting Python’s usage in data scientist community.**Scikit Learn**for machine learning. Built on NumPy, SciPy and matplotlib, this library contains a lot of effiecient tools for machine learning and statistical modeling including classification, regression, clustering and dimensionality reduction.**Statsmodels**for statistical modeling. Statsmodels is a Python module that allows users to explore data, estimate statistical models, and perform statistical tests. An extensive list of descriptive statistics, statistical tests, plotting functions, and result statistics are available for different types of data and each estimator.**Seaborn**for statistical data visualization. Seaborn is a library for making attractive and informative statistical graphics in Python. It is based on matplotlib. Seaborn aims to make visualization a central part of exploring and understanding data.

**Additional libraries, you might need:**

**urllib**for web based operations like opening URLs and performing operations**os**for Operating system and file operations**networkx**and**igraph**for graph based data manipulations**regular expressions**for finding patterns in text data**BeautifulSoup**for scrapping web

**Data Structures:**

Following are some data structures, which are used in Python. You should be familiar with them in order to use them as appropriate.

**Lists**– Lists are one of the most versatile data structure in Python. A list can simply be defined by writing a list of comma separated values in square brackets. Lists might contain items of different types, but usually the items all have the same type. Python lists are mutable and individual elements of a list can be changed.

Here is a quick example to define a list and then access it:

**Strings**– Strings can simply be defined by use of single ( ‘ ), double ( ” ) or triple ( ”’ ) inverted commas. Strings enclosed in tripe quotes ( ”’ ) can span over multiple lines and are used frequently in docstrings (Python’s way of documenting functions). \ is used as an escape character. Please note that Python strings are immutable, so you can not change part of strings.

**Tuples**– A tuple is represented by a number of values separated by commas. Tuples are immutable and the output is surrounded by parentheses so that nested tuples are processed correctly. Additionally, even though tuples are immutable, they can hold mutable data if needed.

Since Tuples are immutable and can not change, they are faster in processing as compared to lists. Hence, if your list is unlikely to change, you should use tuples, instead of lists.

**Sets**– A set is an unordered collection with no duplicate elements. Basic uses include membership testing and eliminating duplicate entries. Set objects also support mathematical operations like union, intersection, difference, and symmetric difference. Set can be defined by using set() function

**Dictionary –**Dictionary is an unordered set of*key: value*pairs, with the requirement that the keys are unique (within one dictionary). A pair of braces creates an empty dictionary:`{}`.

Now that you are familiar with ipython environment, various important libraries and key data structures in Python, we will discuss Arrays, Pandas and dataframes – most commonly used tools to handle structured data in Python.

In our next article in this series, we will read the dataset from Kaggle Titanic competition, import it into a dataframe and then perform exploratory analysis on the data.

In the meanwhile, if you have any tips to share for handy usage of these data structures, please feel free to share them through comments below.

Nice article kunal. waiting for the followup article for reading datasets from kaggale comp.

Great post, looking forward for next post and thanks.

Nice article thanks for sharing this

Great post, I am eager to review next !

Hi Kunal,

Please can you give some example of this scenario –

“Additionally, even though tuples are immutable, they can hold mutable data if needed.”

Thanks for all your efforts in helping the community with such excellent articles.

Rajesh