**Journey from a Python noob to a Kaggler on Python**

So, you want to become a data scientist or may be you are already one and want to expand your tool repository. You have landed at the right place. The aim of this page is to provide a comprehensive learning path to people new to python for data analysis. This path provides a comprehensive overview of steps you need to learn to use Python for data analysis. If you already have some background, or don’t need all the components, feel free to adapt your own paths and let us know how you made changes in the path.

You can also check the mini version of this learning path –> Infographic: Quick Guide to learn Data Science in Python

**Step 0: Warming up**

Before starting your journey, the first question to answer is:

Why use Python?

or

How would Python be useful?

Watch the first 30 minutes of this talk from Jeremy, Founder of DataRobot at PyCon 2014, Ukraine to get an idea of how useful Python could be.

**Step 1: Setting up your machine**

Now that you have made up your mind, it is time to set up your machine. The easiest way to proceed is to just download Anaconda from Continuum.io . It comes packaged with most of the things you will need ever. The major downside of taking this route is that you will need to wait for Continuum to update their packages, even when there might be an update available to the underlying libraries. If you are a starter, that should hardly matter.

If you face any challenges in installing, you can find more detailed instructions for various OS here

**Step 2: Learn the basics of Python language**

You should start by understanding the basics of the language, libraries and data structure. The free interactive Python tutorial by DataCamp is one of the best places to start your journey. This 4 hour coding course focuses on how to get started with Python for data science and by the end you should be comfortable with the basic concepts of the language.

**Specifically learn: **Lists, Tuples, Dictionaries, List comprehensions, Dictionary comprehensions** **

**Assignment:** Take the interactive Python tutorial by DataCamp

**Alternate resources: **If interactive coding is not your style of learning, you can also look at The Google Class for Python. It is a 2 day class series and also covers some of the parts discussed later.

**Step 3: Learn Regular Expressions in Python**

You will need to use them a lot for data cleansing, especially if you are working on text data. The best way to learn Regular expressions is to go through the Google class and keep this cheat sheet handy.

**Assignment:** Do the baby names exercise

If you still need more practice, follow this tutorial for text cleaning. It will challenge you on various steps involved in data wrangling.

**Step 4: Learn Scientific libraries in Python – NumPy, SciPy, Matplotlib and Pandas**

This is where fun begins! Here is a brief introduction to various libraries. Let’s start practicing some common operations.

- Practice the NumPy tutorial thoroughly, especially NumPy arrays. This will form a good foundation for things to come.
- Next, look at the SciPy tutorials. Go through the introduction and the basics and do the remaining ones basis your needs.
- If you guessed Matplotlib tutorials next, you are wrong! They are too comprehensive for our need here. Instead look at this ipython notebook till Line 68 (i.e. till animations)
- Finally, let us look at Pandas. Pandas provide DataFrame functionality (like R) for Python. This is also where you should spend good time practicing. Pandas would become the most effective tool for all mid-size data analysis. Start with a short introduction, 10 minutes to pandas. Then move on to a more detailed tutorial on pandas.
- Check out DataCamp’s course on Pandas Foundations

You can also look at Exploratory Data Analysis with Pandas and Data munging with Pandas

**Additional Resources:**

- If you need a book on Pandas and NumPy, “Python for Data Analysis by Wes McKinney”
- There are a lot of tutorials as part of Pandas documentation. You can have a look at them here

**Assignment: **Solve this assignment from CS109 course from Harvard.

**Step 5: Effective Data Visualization**

Go through this lecture form CS109. You can ignore the initial 2 minutes, but what follows after that is awesome! Follow this lecture up with this assignment

Check out Bokeh Data Visualization Tutorial from DataCamp

**Step 6: Learn Scikit-learn and Machine Learning**

Now, we come to the meat of this entire process. Scikit-learn is the most useful library on python for machine learning. Here is a brief overview of the library. Go through lecture 10 to lecture 18 from CS109 course from Harvard. You will go through an overview of machine learning, Supervised learning algorithms like regressions, decision trees, ensemble modeling and non-supervised learning algorithms like clustering. Follow individual lectures with the assignments from those lectures.

**Additional Resources:**

- If there is one book, you must read, it is Programming Collective Intelligence – a classic, but still one of the best books on the subject.
- Additionally, you can also follow one of the best courses on Machine Learning course from Yaser Abu-Mostafa. If you need more lucid explanation for the techniques, you can opt for the Machine learning course from Andrew Ng and follow the exercises on Python.
- Tutorials on Scikit learn

**Assignment:** Try out this challenge on Kaggle

**Step 7: Practice, practice and Practice**

Congratulations, you made it!

You now have all what you need in technical skills. It is a matter of practice and what better place to practice than compete with fellow Data Scientists on Kaggle. Go, dive into one of the live competitions currently running on Kaggle and give all what you have learnt a try!

**Step 8: Deep Learning**

Now that you have learnt most of machine learning techniques, it is time to give Deep Learning a shot. There is a good chance that you already know what is Deep Learning, but if you still need a brief intro, here it is.

I am myself new to deep learning, so please take these suggestions with a pinch of salt. The most comprehensive resource is deeplearning.net. You will find everything here – lectures, datasets, challenges, tutorials. You can also try the course from Geoff Hinton a try in a bid to understand the basics of Neural Networks.

**Get Started with Python:** A Complete Tutorial To Learn Data Science with Python From Scratch

**P.S. In case you need to use Big Data libraries, give Pydoop and PyMongo a try. They are not included here as Big Data learning path is an entire topic in itself.**

Thanks for the post !!

Great Post! Thanks a lot.

Good to get people up and running with Python. I am sure it will help a lot of people. Good luck.

mark ,good

Good tips. You might consider more of an emphasis on IPython Notebooks. They are fast becoming the go to for reproducible research and are a great learning resource.

Excellent point – I’ve been using iPython Notebooks and I really love how flexible they are and how quickly I’m able to try out new ideas.

Hi,

Your website is a great source of learning and many of its articles create curiosity and increase my urger to learning.

I am following Data Science specialization track from corsera and working on learning R for data science.

Please can you add a similar Comprehensive learning path – Data Science in R

Many thanks in advance for your help.

Regards,

Divya Tanwani

Kunal,

Thanks for the path. I was looking for something like this.

What time frame do you think will be needed to complete this path from scratch (beginner to python)?

Hi DK,

The learning path is meant to be done in self paced manner. The learning rate varies from person to person. A good coder learning full time can complete this learning path in a couple of months. On the other hand, if you are doing this part time, it can take up to 12 months.

Regards,

Kunal

Great! It’s very good for me!

thanks for compiling and sharing this.. knowledge is wealth and learning is a life long exercise.

i will post again when i get on the other side of curriculum.

It could be further nice if you would have shared info to learn the foundation like Statistics, Numrical methods etc.

This might be a great starter point http://cs231n.github.io/python-numpy-tutorial/

Hi kunal,

Thanks for sharing such a nice article , i’m requesting you any websites available to take challenge on parsing techniques over big data analysis.

Additionally, you can also follow one of the best courses on Machine Learning course from Yaser Abu-Mostafa.

This URL is broken

Wow! the post is an amazing resource, I have been programming extensively in R but wanted to learn python for the DS, this post just gave me a superb path to follow!

Thanks a lot buddy!

Thanks a lot for such a good guidence. From past few months I was stucked with my thoughts about which programming language should I choose to start my journey to be a Data Scientist. I am intrested to go with python but due R’s populatrity I was unable to make my decision but a last I started with python and I am still in the learning process of python,and now I have pretty much clearity about my thoughts how to proceed with my journey to a Data Scientist using python.

Again thanku so much for this guidence.