The most comprehensive Data Science learning plan for 2017

NSS 05 Jul, 2020 • 20 min read

Overview

  • A learning plan for Data Science is necessary to become a successful data scientist
  • For beginners and transitioners, R, Python, basic of statistics, basic and advanced machine learning algorithms form the plan.
  • For intermediate students, advanced machine learning algorithms, big data, deep learning and reinforcement learning are required to be understood
  • Practicing with datasets and an online github profile are helpful in showcasing your skills

Note  – Here is “The Ultimate Learning Path to Becoming a Data Scientist in 2019”

I joined Analytics Vidhya as an intern last summer. I had no clue what was in store for me. I had been following the blog for some time and liked the community, but did not know what to expect as an intern.

The initial few days were good – all the interns were smart, motivated and fun to be around. We played cricket in office, did internal hackathons over weekends and learnt a lot of data science. But, if there was one defining moment for me in the internship – it was when I realized the impact Analytics Vidhya was having in data science community.

I saw thousands of people following Analytics Vidhya religiously. I saw people looking up for guidance in our meetups and hackathons. I saw people transitioning their careers because of the resources we provide them. That is when this good internship transformed into a mind blowing experience.

That is the day I decided that this is my calling. It just felt that this is what I would want to do daily.

 

Why create this learning path?

Among various resources on Analytics Vidhya, learning paths are special. The amount of effort and thinking they need is tremendous. The number of drafts they undergo is mind-boggling. But, the kind of impact they create for our audience is HUGE. That is why I decided that I will create a learning plan for 2017 for all our followers.

We created a similar plan for 2016 and we saw transitions happening by people following this learning plan. This time we have created a much granular and a more detailed learning plan. The sole aim behind creating this comprehensive plan is to create a much bigger impact for our followers this year.

 

Who should use this learning path?

This learning path would be extremely useful for any one who wants to learn machine learning, deep learning or data science in this year. If you plan to wait for a year, we will publish something similar in 2018 as well 🙂

But, for the people looking for action this year, this framework and plan of action should be extremely useful. Whether you are a complete fresher or a transitioner or you are looking to up-skill yourself, this plan should give you the necessary direction.

We published a similar plan in 2016 and we saw followers making transition by simply following the plan. This year’s plan is more nuanced than last year’s one – so if you plan to pick up / improve data science skills – this plan will guide you through the journey.

 

How can you use this learning path?

In creating this plan, we have removed the confusion from the process of learning. The biggest challenge which people face while learning is not dearth of learning material – but too much of it. You are not sure where to start learning, what to practice, how much time to spend on a concept, where to get the useful resources etc. For most of the beginners, this becomes overwhelming and they simply drop out before even learning a single skill.

This plan takes this confusion out. This path contains both theoretical resources as well practical examples. We have also provided you with resources / tests to apply your learning and benchmark yourself. As part of this plan, you will apply the concepts you learn on real-world problems and gain hands-on experience.

 

Table of Contents

  1. A few definitions before we start
  2. Setting a target and timelines for yourself
  3. Beginner’s Path for 2017
  4. Transitioner’s Path for 2017
  5. Intermediate’s Path for 2017
  6. End Notes

1. A few definitions before we start

The first thing you need to do is identify which kind of learner are you. Have a look at the definitions / descriptions below and identify which category you belong to.

  • Who is a beginner data scientist?
    • A beginner has no prior experience in data science or machine learning
    • Does not know any analytical tool or languages like R, SAS or Python
    • No prior knowledge of subjects like mathematics & statistics.
    • A person who has prior exposure to some of the sections in this article like probability, linear algebra can feel free to skip the initial sections of the learning path to pace up their learning.
  • Who is a transitioner data scientist?
    • A transitioner has no prior experience in any of the analytics tools like R/Python
    • Does not know Machine Learning concepts etc and
    • Has work experience  more than 3 years in industry other than Analytics.
    • A person who has prior exposure to some of the sections in this article like probability, linear algebra can feel free to skip the appropriate sections of the learning path and pace up their learning.
  • Who is an Intermediate data scientist?
    • People who already know Data Science, are comfortable with building predictive Machine Learning models
    • They participate in Data Science competitions and hackathons on a regular basis.
    • Prior knowledge of Basic and Advanced Machine Learning algorithms is necessary.


 

2. Setting target and timelines for yourself

We have created these guides with the following target in mind:

  • Beginner Data Scientist
    • Learn basic mathematics and statistics required for data science
    • Develop a basic understanding of machine learning algorithms and solving real life problems from them
    • Skills required to land you first data science internship / job.
    • Time spent ~ 3 hours / day
  • Transitioner Data Scientist
    • Learn basic mathematics and statistics required for data science
    • Develop a basic understanding of machine learning algorithms
    • Work on projects and create a portfolio of projects
    • Skills required to land your first data science internship / job.
    • Time spent ~ 5 hours / day

  • Intermediate Data Scientist
    • Understand Deep Learning techniques and algorithms to the extent of applying them on real world problems.
    • Learn to create awesome Interactive Visualizations and improve your story telling capabilities.
    • Understanding of recent development (Reinforcement Learning) in the field of Data Science and incorporate them into the existing Machine Learning frameworks.
    • Web Frameworks and cloud computing to create independent Data / machine learning products.
    • Time spent ~ 3 hours / day

 

3. Ultimate Beginner’s path for 2017

Structure for your 2017 journey:

 

3.1: Getting Started and testing the waters

Time suggested: 4 weeks (January 2017)

At this stage, it is important to understand why you want to become a data scientist? What are your strengths and weaknesses? Do you know what it takes to be a Data Scientist? You must answers these questions before jumping on the boat of Data Science journey.

Watch this excellent video where Tetiana Ivanova describes how she became a Data Scientist without going through a Masters or doctorate program in data science and with help of Meetups.

Here are some additional resources you can use to answer these questions:

  1. What is Data Science? – This article by Data Jobs will give you a broad perspective of how data science is being used in Netflix and Amazon. Also, it will highlight the skill set required for Data Science.
  2. Should I become a Data Scientist? This article points out some questions for you to decide whether you are fit for a Data Scientist role. I suggest you must go through this article before proceeding further.
  3. Next, you should attend local meetups in your area. Go out and find out what people are talking about Data Science / Machine Learning. Meetups not only help you learn the tools and techniques, they provide you with a network of people in similar industry which helps you in finding the right jobs and internships later on.

Go ahead and think through these aspects of choosing a career in data science. This decision is going to decide the next 11 months of your life.

 

3.2: Basics of Mathematics and Statistics

Time suggested: 8 weeks (February 2017 – March 2017)

Topics to be covered:

  • Descriptive Statistics – 1 week
  • Probability – 2 weeks
  • Inferential Statistics – 2 weeks
  • Linear Algebra – 1 week
  • Structured Thinking – 2 weeks

Descriptive Statistics – 1 week

 

Probability – 2 weeks

Inferential Statistics – 2 weeks

  •  Course (mandatory) – Intro to Inferential Statistics from Udacity – Once you have gone through the descriptive statistics course, this course will take you through statistical modeling techniques and advanced statistics.
  •  Books (optional) – Online Stats Book – This online book can be used for a quick reference for inference tasks.

Linear Algebra – 1 week

  • Course (mandatory)
    • Linear Algebra – Khan Academy : This concise and an excellent course on Khan Academy will equip you with the skills necessary for Data Science and Machine Learning.
  • Books (optional)

Structured Thinking – 2 weeks

  •  Competitions (mandatory): No amount of theory can beat practice. This is a strategic thinking problem which will test you on your thinking process. Also, keep an eye on business case studies as they help in structuring your thoughts tremendously.

 

3.3: Introducing the tool – R / Python

Time suggested: 8 weeks (April 2017 – May 2017)

Topics to be covered:

  • Tools (R/Python) – 4 weeks
  • Exploration and Visualization (R/Python) – 4 weeks
  • Feature Selection/ Engineering

Tools

1. R

  • Books – R for Data Science – This is your one stop solution for referencing basic materials on R.
  • Blogs/Articles
    • This article will serve a great point for collating the entire process of model building starting from installation of RStudio/R.
    • R-bloggers – This is one of the most recommended blog for R- users. Every R practitioner should keep this blog bookmarked. It has some of the most effective and practical R tutorials. Bookmark it now.

2. Python

  • Books (mandatory) – Python for Data Analysis – This book covers various aspects of Data Science including loading data to manipulating, processing, cleaning and visualizing data. Must keep reference guide for Pandas users.

 

Exploration and Visualization

1. R

  • Course
    • Exploratory Data Analysis – This is an awesome course by Johns Hopkins University on Coursera. You will need no other course to perform visualization and exploratory work in R.
  • Blogs/Articles
    • Comprehensive guide to Data Exploration in R – This will be a one-stop article that I will suggest you to go through carefully and follow every step. This is because the steps mentioned in the article are the same steps you will be using while solving any data problem or a hackathon problem.
    • Cheat sheet – Data Exploration in R – This cheat sheet contains all the steps in data exploration with codes. I suggest you to take out a print and paste it on your wall for quick reference.

2. Python

  • Course (optional)
    • Intro to Data Analysis – This is an excellent course by Udacity on Data Exploration using Numpy and Pandas.
  • Books (optional) – Python for Data Analysis – A one stop solution for your Data Exploration and Visualization in Python.

 

Feature Selection/ Engineering

  • Books (optional) – Mastering Feature Engineering: This book is master piece to learn feature engineering. Not only will you learn how to implement feature engineering in a systematic way. You will also learn different methods involved in feature engineering.

 

3.4: Basic & Advanced machine learning tools

Time suggested: 12 weeks (June 2017 – August 2017)

Topics to be covered (June 2017 – July 2017):

  • Basic Machine Learning Algorithms.
    • Linear Regression
    • Logistic Regression
    • Decision Trees
    • KNN (K- Nearest Neighbours)
    • K-Means
    • Naïve Bayes
    • Dimensionality Reduction
  • Advanced algorithms (August 2017)
    • Random Forests
    • Dimensionality Reduction Techniques
    • Support Vector Machines
    • Gradient Boosting Machines
    • XGBOOST

Linear Regression

  • Course
    • Machine Learning by Andrew Ng – There is no better resource to learn Linear Regression than this course. It will give you a thorough understanding of linear regression and there is a reason why Andrew Ng is considered the rockstar of Machine Learning.
  • Blogs/Articles
    • This lesson out of PennState Stat 501 course outlines the main features of Linear Regression ranging from a simple definition of a Linear Regression to determining the goodness of fit of a regression line.
    • This is an excellent article with practical examples to explain Linear Regression with code.
  • Books
    • The Elements of Statistical Learning – This book is sometimes considered the holy grail of Machine Learning and Data Science. It explains Machine Learning concepts mathematically from a Statistics perspective.
    • Machine Learning with R – This is a book I personally use to have a brief understanding of Machine Learning algorithms along with their implementation code.
  • Practice
    • Black Friday – Like I already said – No amount of theory can beat practice. Here is a regression problem that you can try your hands on for a deeper understanding.

 

Logistic Regression

  • Course (mandatory)
    • Machine Learning by Andrew Ng– The week 3 of this course will give you a deeper understanding of the one of the most widely used classification algorithm.
    • Machine Learning: Classification – Week 1 and 2 of this practical oriented Specialization course using Python will satiate your knowledge thirst about Logistic Regression.
  • Books (optional)
    • Introduction to Statistical Learning – This is an excellent book with a quality content on Logistic Regression’s underlying assumptions, statistical nature and mathematical linkage.
  • Practice (mandatory)
    • Loan Prediction – This is an excellent competition to practice and test your new Logistic Regression skills to predict whether loan status for a person was approved or not.

 

Decision Trees

  • Course (mandatory)
  • Books (mandatory)
    • Introduction to Statistical Learning – Section 8.1 and 8.3 explain the basics of decision trees through theory and practical examples.
    • Machine Learning with R – Chapter 5 of this book provides you the best explanation of Machine Learning Algorithms available in the market. Here, the decision trees are explained in an extremely non-intimidating and easier style.
  • Practice (mandatory)
    • Loan Prediction – This is an excellent competition to practice and test your new Logistic Regression skills to predict whether loan status for a person was approved or not.

 

KNN (K- Nearest Neighbors)

  • Course (mandatory) 
    • Machine Learning – Clustering and Retrieval: Week 2 of this course progresses to k-nearest neighbors from 1-nearest neighbor and also describes the best ways to approximate the nearest neighbors. It explains all the concepts of KNN using python.

 

K-Means

 

Naive Bayes

  • Course
    • Intro to Machine LearningTake this course to see Naive Bayes in action. In this course, Sebastian Thrun has explained Naive Bayes in Simple English. 
  • Blog / Article
    • 6 Easy Steps to Learn Naive Bayes Algorithm (with code in Python) : This article will take you through Naive Bayes algorithm in detail. In this guide, you will learn how Naive Bayes algorithm works, applications and many more. It will also give you hands-on knowledge of building a model using Naive Bayes.
    • Naive Bayes for Machine Learning : This is one of the most comprehensive articles I have come across. Go through this article to have a complete understanding of why naive bayes algorithm is important for machine learning.

 

Dimensionality Reduction

 

Random Forests

 

Gradient Boosting Machines

  • Presentation (mandatory): Here is an excellent presentation on GBM. It contains the prominent features of GBM and the advantages and disadvantages of using it to solve real-world problems. It is must see article for somebody trying to understand GBM.

 

XGBOOST

  • Blogs /Articles (mandatory)
    • Official Introduction XGBOOST – Read the documentation of hackathons winning algorithm. It is an improvement over GBM and is right now the most widely used algorithm for winning competitions.
    • Using XGBOOST in R – An excellent article on deploying XGBOOST in R using a practical problem at hand.
    • XGBOOST for applied Machine Learning – An article by Machine Learning Mastery to evaluate the performance of XGBOOST over other algorithms.

 

Support Vector Machines

 

3.5: Building your profile

Time suggested: 8 weeks (September 2017 – October 2017)

Topics to be covered:

  1. GitHub Profile Building
  2. Practice via competitions
  3. Discussion Portals

 

GitHub Profile Building (mandatory)

It is very important for a Data Scientist to have a GitHub profile to host all the codes of the project he/she has undertaken. Potential employers not only see what you have done, how you have coded and how frequently / how long you have been practicing data science.

Also, codes on GitHub open up avenues for open source projects which can highly boost your learning. If you don’t know how to use Git, you can learn from Git and GitHub on Udacity. This is one of the best and easy to learn course to manage the repositories through terminal.

 

Practice via competitions (mandatory)

Time and again, I have stressed on the fact that practice beats theory. Moreover coding in hackathons brings you closer to developing data products in real life for solving real world problems. Below are most popular platforms to participate in Data Science/ Machine Learning Competitions.

  1. Analytics Vidhya Datahack
  2. Kaggle competitions
  3. Crowd Analytix human layer

 

Discussion Forums (optional)

Discussions are a great way to learn in a peer-to-peer setup from finding an answer to a question you stuck to providing answers to someone else’s questions. Below are some of the discussion rich platforms which you should keep a tab on to clear your doubts.

  1. Analytics Vidhya Discussion Portal
  2. Kaggle Discussion
  3. StackExchange

 

3.6: Apply for Jobs & Internships

Time suggested: 8 weeks (November 2017 – December 2017)

Topics to be covered: Jobs / Internships

If you are here after diligently following the above steps, then you can be sure that you are ready for a Job / Internship position at any Data Science / Analytics or Machine Learning firms. But it becomes quite difficult to identify the right jobs. So, for the purpose of saving the trouble, I have created a list of portals which lists down Data Science/ Machine Learning jobs and Internships.

  1. Analytics Vidhya Job Portal
  2. Datajobs
  3. Kaggle Job portal
  4. Internshala

In order to prepare for these interviews, you should go through this Damn Good Hiring Guide

 

4. Transitioner’s path for 2017

Let me start by giving you the bad news – it is not going to be easy to transition in data science. Also, the more your work experience, the more difficult your transition would typically be. You would need a strong resolve – there will be times when you might question, whether this is the right domain for you.

The good news is that once you get your first break in the industry, there is no looking back. Also, because of the salary differential from other industry, you may not need to compromise on your earnings during transition.

To achieve your goal all you have to do is follow this learning path diligently. We have covered all the skills, techniques you need to gain to take your first steps in data science.

The Ultimate Path for transitioners

Simply put, if you are looking for a transition under a year, you will need to learn everything we laid out for the beginner above. Additionally, you will need to carve out additional time to showcase your skills. You will need to overcome the doubts of your potential employers through your projects and work.

I am sure you are beginning to understand why transition is not an easy thing.

Structure for your 2017 journey:

The structure of the path is similar, but you will need to accelerate your learning in the first half of the plan. Start by going through this article and go through a few success stories to understand what a transition would entail. Once you are set for the journey, follow the plan by sticking to these timelines.

  • Step 1: Getting started and testing the waters (1 week in January ’17)
  • Step 2: Mathematics & Statistics (Jan ’17 – March ’17)
  • Step 3: Introducing the tool – R / Python (March ’17 – April ’17)
  • Step 4: Basic & Advanced machine learning tools (May ’17 – July ’17)
  • Step 5: Building your profile (Aug ’17 – Oct ’17)
  • Step 6: Applying for Jobs  (Nov ’17 – Dec ’17)

 

5. Intermediate’s path for 2017

If you can build predictive models, but don’t necessary know deep learning and some recent development in the domain, this learning path can help you out. Depending on your skills and learning plan for the year, you can pick and choose the areas you want to learn.

Structure of intermediate path for 2017:

 

5.1: Assess your technical & structured thinking skills – Jan 2017

The first step in creating your learning plan is to benchmark yourself on various skills – both technical and structured thinking. You can go through the skill tests on Analytics Vidhya to judge whether you need to review the old material. If you do well, go ahead with acquiring new skills. Else, go back to practice for some more time.

If you feel the need to go through the old material once again, refer to beginner’s path which contains various useful resources.

Skill tests:

  1. Statistics 1 & Statistics 2
  2. R for Data Science 
  3. Python for Data Science
  4. Machine Learning
  5. Regression
  6. Tree-based algorithms
  7. SQL 

Structured Thinking

  • Competitions (mandatory): Check out strategic thinking problem to test your structured thinking. Also, keep an eye on business case studies as they help in structuring your thought process.

 

5.2: Few more ML algorithms – Feb 2017

There are a few specific machine learning algorithms, which come in handy while solving specific problems. For example, try solving online click prediction on large data sets with out applying online learning algorithms and you would know what I am talking about. Here are a few advanced ML algorithms you should learn this month:

Online Machine Learning

 

Vowpal Wabbit

 

FTRL- Algorithms


Exercise: Practice on one of the old Kaggle competitions or open click through rate data sets as provided by Criteo.

 

5.3: Pick up a data visualization tool (March 2017)

Ideally you should pick up D3.js for sure and either one of QlikView and Tableau. While D3.js provides the most flexibility, QlikView and Tableau are both handy for creating dashboards or less complex story creation and narration.

Topics to be covered:

  • Interactive Visualization using d3.js (3 weeks)
  • Creating Visualizations in QlikView (1 week)
  • Creating Visualizations in Tableau (1 week)

Interactive Visualization using d3.js

The reason d3.js is not so much popular among Data Scientist is because it requires an entire different skill test like HTML, CSS, Javascript which is not typical of a Data Scientist.

But knowing D3.js can take your story telling capabilities to a different level. You can create non-static Interactive graphs embedded right in a browser for a much richer experience. Below are the list of resources to master d3.js

  • Course Data Visualization and d3.js : This is an excellent course provided by Zipfian experts on Udacity and a part of Facebook’s Data Analyst Nanodegree program.
  • Code-Oriented Resource Dashing d3.js – This is a code oriented tutorial which will help you create your Interactive Visualizations. This is also the same tutorial I am currently undergoing to learn d3.js

  

Creating Visualizations using QlikView

 

Creating Visualizations in Tableau

  • Blogs/Articles (mandatory) – Your guide to become a Tableau expert – This is a comprehensive learning path to become an expert at Tableau. The article is very well structured and detailed. Keep it bookmarked to reference often.

 

5.4: Big Data tools and techniques (April 2017)

Big Data

Other useful tools:

 

 

5.5: Deep Learning Basics & Advanced (May 2017 – August 2017)

Deep Learning Basics (May 2017 – June 2017)

  • Course (mandatory)
    • Machine Learning by Andrew Ng – There is no better introductory material to Deep Learning and Neural Networks than Week 4 and Week 5 material of this course.
    • Deep learning by Google | Udacity – This is an excellent basic course on transition from Machine Learning to Deep Learning, deep neural networks, Convolutional Neural Networks and Deep Learning for texts.
  • Reading Material/Books
    • Deep learning Textbook – Written by people like Ian Goodfellow, Yoshua Bengio and Aaron Courville, this book is bound to become the de-facto for people trying to learn Deep Learning.
    • Stanford Deep Learning tutorial – This is an all text and images resource provided by Stanford which starts from Linear Regression and goes to Convolutional Neural Networks with ease.
  • Practice – Identify the digits –  An awesome contest to check the basics you have learned to identify handwritten digits.

 

Deep Learning advanced (June 2017 – August 2017)

 

5.6: Reinforcement Learning (September 2017 – October 2017)

Topics to be covered: Reinforcement Learning (Theory)

 

5.7: Web frameworks & Cloud Computing (November 2017 – December 2017)

Web Frameworks

Now that you know machine learning well, you might want to apply it to web products. What you need to learn is a working knowledge about web frameworks. Web frameworks allow you to quickly build and prototype web based products, with out getting into the complications of coding.

Given that you would already have working knowledge of Python, you can choose any of the Python based web frameworks. I would recommend Flask for its simplicity. Flask is a simple and light web framework, which should serve your needs well. If you are looking to build a complex web product, you might want to consider Django as well.

Resources for learning Flask:

Exercises:

Additionally, you should do a side project to merry your machine learning skills and web development skills. You can build a simple web application where users can upload pictures and find which make and model the car is. Or may be tells people about their age.

 

Cloud computing

Now that you know how to build web applications, you should also get your hands dirty on cloud computing. A few popular platforms are Amazon Web Services (AWS), Google Cloud platform and Microsoft Azure.

Each of these platform provide extensive documentation for their offering. If you have to pick only one – AWS is the way to go because of its popularity, wide spread use and comprehensive offerings.

 

End Notes

I hope you found this learning path helpful. I have made it as specific and comprehensive as possible. If you think I have missed out on any specific areas or resources, do let me know.

If you want to progress in your data science journey all you have to do is choose your category and follow the learning diligently.

If you have any questions, doubts or suggestions drop in your comment below and I will be happy to answer them.

If you want to make your own learning path share it with me how are you planning to follow your journey of becoming a data scientist.

Learn, compete, hack and get hired!

NSS 05 Jul 2020

I am a perpetual, quick learner and keen to explore the realm of Data analytics and science. I am deeply excited about the times we live in and the rate at which data is being generated and being transformed as an asset. I am well versed with a few tools for dealing with data and also in the process of learning some other tools and knowledge required to exploit data.

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear

Vishwachandra
Vishwachandra 16 Jan, 2017

Hi NSS/Kunal, The effort you guys are putting to enable the data community onto the data science track is commendable.My best wishes to you. Vishwa

aSAs asasas
aSAs asasas 16 Jan, 2017

Hi, I am a 30 yrs with professional experience of 6 yrs in retail Banking , I have joined Jigsaw with Life time Membership.. I give 2 hrs daily to the Data Science and big data. I want to ask what extra i will do to change my Industry from Banking TO Analytics field and what would be my Starting Package if Now i am getting around 7 LPA

Jayant Singh
Jayant Singh 16 Jan, 2017

This article, I can safely say is the best I have came across till now. I have been tirelessly searching to come up with a practical scheme (I'm an undergrad at Kharagpur) to prepare myself for the data jobs. NSS I can only imagine the hard work you guys have put in to come up with such an elaborate and all-embracing article. I'm an avid follower of AV but was always a bit indolent to write comments, but not praising this would be lethal, I guess :P . Keep up the wonderful work. Thanks a ton for this brilliant write up! Jayant PS : Your Connect with: FB is not working I believe

Swati Kashyap
Swati Kashyap 16 Jan, 2017

Hi Jayant, You can now share this on FB, It's working.

Pablo Palau Fuster
Pablo Palau Fuster 16 Jan, 2017

Courage, you have to train and have skills !. Companies value what you know how to do. Here, you will find a great plan to be very employable. Lots of energy and luck!

sachin
sachin 16 Jan, 2017

Stands the reason why AV is the google for Data Science.

Syaamantak Das
Syaamantak Das 16 Jan, 2017

Please, Please provide a printable pdf format of this article. What an article! Kudos to AV and NSS. Thanks and Regards.

sachin
sachin 16 Jan, 2017

Thanks AV, for such a comprehensive guide for guys like me who want venture into the field of Data Science but has no idea as to where to actually begin, considering the vastness and exponential growth of the field.

Adarsh Kumar
Adarsh Kumar 16 Jan, 2017

Excellent Article. Thanks you very much for sharing.

Radovan
Radovan 16 Jan, 2017

Great article and very promising for data science newbies and experienced professionals. Thank you. However, when I look at the courses you suggest and the timelines you propose, there seems to be a disconnect. For example, in 3.2: Basics of Mathematics and Statistics, you have Descriptive Statistics – 1 week, but the course you propose has a timeline of 2 months. The second course. Introduction to probability – The science of uncertainty, has last been offered in 2015 and is archived since then and not available. Intro to Inferential Statistics from Udacity is a 2 months course, again you have it down for 2 weeks. What am I missing? Thank you.

Shilpa
Shilpa 17 Jan, 2017

Thank you NSS for publishing this learning path. I just started exploring data science and faced the challenge of chalking out a learning strategy. Your article has come just in time !

Rajesh Sinha
Rajesh Sinha 17 Jan, 2017

Since last a month I'm regularly following Analystvidya @Facebook,, I found most of the posts quite resourcesfull,, Data Science for non-mathmetics is really a wonderful concept,, being a IT professional I'm also looking forward to learn more about it.

Shankar
Shankar 17 Jan, 2017

Nice Article. Very helpful for data scientists. Appreciate your effort

Saurabh
Saurabh 17 Jan, 2017

Hi, I am an undergrad(final year) from BITS Pilani. After years of competitive programming , I have decided to dive in the interesting field of Machine Learning. I must say this is by far the most helpful article for beginner like me. Can you also share with us some preparatory guides which would help in landing me a job in companies like Goldman Sachs .

badri
badri 17 Jan, 2017

solid piece of work, thanks

Renato
Renato 17 Jan, 2017

Thanks for your great insight and generosity! My best wishes.

ilan cooke
ilan cooke 18 Jan, 2017

Question about the math required for this course of study: most graduate degrees in statistics have calculus prerequisites (including multi-variate) that must be completed before acceptance into their programs. I'm told a lot of the math behind certain areas of statistics is based on integrating functions, and a knowledge of calculus is necessary for their understanding. Do you feel Calculus would be a useful and necessary addition to your learning plan?

Naoufal Touailat
Naoufal Touailat 18 Jan, 2017

Is there a way we can have an account in the weebsite in order to bookmark our favorite articles ? keep the good work.

Bala
Bala 19 Jan, 2017

Well, you dont need a better guide than this to proceed with a systematic approach to become a data scientist.Kudos to the team and the author. Thanks

Roshan S
Roshan S 19 Jan, 2017

So beautiful! I looked all around quora and a multitude of websites and what not, you name it! Don't get me started on the thousand dollar bootcamps. This is amazing work guys!

Arjun S
Arjun S 21 Jan, 2017

Amazing Learning path! Thank you so much for this. Although I came to a standstill at one point.. The course for probability (beginners path), the course content will be coming out over the weeks.. Unfortunately I won't be able to do that course in the prescribed 2 weeks time because the course content of the whole course will be coming out only by May of this year. How should I proceed along with this??

Mayank
Mayank 23 Jan, 2017

This is a brilliant piece of article. Thank you for the hardwork

shivanesh kumar
shivanesh kumar 23 Jan, 2017

Hi NSS i am a fresher from mechanical Engineering, actually so long i was learning data science by my own i mean no schedule from others but it gave me a clear picture of what all the things needed and in proper order, actually i am doing coursera's data scientist toolbox course along with the practical exposure in datacamp, in coursera they introduced the tool R in the beginning is it worth to learn in that way? And I personally feel that coursera's exploratory data analysis course didnt give much things in practical way the instructor just fliping with concepts and a course in udactiy called data analysis in R gave a very good practical exposure to ggploting system. Once again Thank you soo much for your plan it will help thousands of people like me.

Jatin Kinra
Jatin Kinra 24 Jan, 2017

Hey NSS, First of all, I want to thank you and AV for such an helpful material for transitioning to Data Analytics as a career. Now, I am going through Probability course, and they released just two weeks material till now which I have completed, but I want to complete the whole course before going further. So, Can you please provide the link for Archived course, It will be really helpful. Thanks

Debarati
Debarati 25 Jan, 2017

An amazing article related to data science learning. Python is indeed an excellent programming asset that helps in building up the network of data science as it helps to program all latest devices. The code format is simple and tricky at the same time, so a lot of patience is required to learn it. But once the knowledge of Python is acquired, things like IoT and Cognitive Computing become easy and thus helps the progress of data science.

Sakshi Gupta
Sakshi Gupta 26 Jan, 2017

Hi NSS, Actually I started this course last year and so I was following the last year's plan, but unfortunately due to my work pressure, I couldn't complete it in last year. Some things seems to be same but some are different. Is it wise to still follow the 2016 plan, as I have already completed 50+% of probability and statistics from Khan Academy ?

Rakesh
Rakesh 27 Jan, 2017

NSS, You've done great job in compiling the best for us to achieve our goals. This will save us loads of time. But since I'm only familiar with R, intermediate and advanced topics do not have R as base language to understand the concept, for e.g. most Udacity courses are based off Python.

Sakshi Gupta
Sakshi Gupta 27 Jan, 2017

Hi NSS, I was following the data science learning path for 2016, but couldn't complete it last year. I have completed - Descriptive Stats, Inferential Stats, Algebra, but couldn't complete Probbaility ( from Khan Academy ) although I have completed almost 60% of it. Can you please advise, if I should carry on with the Khan Academy course or should I start eDX one, which you have pointed for 2017 learning path ? Regards Sakshi

Deepashu
Deepashu 31 Jan, 2017

Great Job NSS in compiling this learning path. Just one small suggestion from my side that if possible, please include the meetups information related to data science related fields, that would offer great learning and networking opportunities to one and all on this path. Thanks and Regards, Deepashu

Arun
Arun 01 Feb, 2017

Hello NSS, I have 4 projects(created one year ago) done but I got to know about github from my friends.How do I post it that it was done in previous year?

Ali
Ali 02 Feb, 2017

Brilliant peace of work. Appreciate your efforts. Bless you :)

Ali
Ali 02 Feb, 2017

Hi NSS: One quick question, we don't have to learn both R and Python in details, only learning one in details is suffice, I guess? I, myself is a SAS advance programmer, now currently trying to grow my skills in DS. However my plan is, as I have some exposure with python, to learn python in details and just do nothing or perhaps some basics in R. What would be your advise for that? thanks,

Ayesha Siddiqa
Ayesha Siddiqa 03 Feb, 2017

Hi Team, This article is exactly what I was looking for. You have made so many lives easier !! Great work!!

Nikhil Kumar M
Nikhil Kumar M 03 Feb, 2017

Very useful information NSS. Keep it up !

Munish Gupta
Munish Gupta 06 Feb, 2017

Hi, is the path for beginners and transitioners same or different because I don't see any separate links for them.

Munish Gupta
Munish Gupta 07 Feb, 2017

Hi, I am starting with data science course. Now I have two choice to start with: 1. Data science specialization from John Hopkins University on Cousera 2. The Analytics Edge on Edx I am little confused which one to do first

minxi
minxi 11 Feb, 2017

Hi, NSS: Your article is really helpful. It seems that there is a wrong link in the sector 3.3 Feature Selection/ Engineering, the "Blog – A Comprehensive Guide to Data Exploration"'s link is point to another pdf file.

Robert
Robert 17 Feb, 2017

Where would you suggest a transitioner starts who is also mathematician?

Syeda
Syeda 19 Feb, 2017

Hello NSS, Seriously commendable efforts. What keep your guys ticking? Whatever, it will help! I am from a pure health science background, who had her hands in lots of things. The information out there is overwhelming. Your streamlining efforts are very helpful Thanks tons! Toast to sharing!!

Ross
Ross 24 Feb, 2017

Hi NSS, Thank you for providing such a clear guide, is has been very helpful in improving my analytic skills. Just a small confusion found in the article: Under feature selection / engineering, the first link to a blog article actually get me to the book 'python for data analysis'. Wondering if this is intentional?

Girish Pathria
Girish Pathria 26 Feb, 2017

Great work pulling this together! Really appreciate it!!

Saurabh
Saurabh 02 Mar, 2017

Is it possible to do edx courses at our own pace??. As I have noticed that not all the units are uploaded at once.

Sid Saha
Sid Saha 03 Mar, 2017

Looks like the book "Mastering Feature Engineering" is not available anymore?

Rahulkumar Mishra
Rahulkumar Mishra 08 Mar, 2017

One of the most insightful and most comprehensive Data Science blog to cover all knitty gritties of Data Science Universe.In addition to this,the recently conducted Datafest AV 2017 , Mumbai region was one of the best opportunities for aspiring Data Scientists like us to explore more into the industry.Looking for more such meetups on Data Analytics and wishing you all a great luck ahead.

Rohit
Rohit 13 Mar, 2017

@NSS You are a God sent, I would like to thank you with all my heart , this is by far one of the most descriptive article I could find on internet for career explorers like me, thank you so much my friend, you are a blessing for the freshers.

Luke Smith
Luke Smith 16 Mar, 2017

Thanks for pointing out some of the differences between beginners in data science (like those who have no prior experience) and more intermediate members (like those who are already comfortable with building predictive Machine Learning models). I feel like knowing where you stand would definitely help you know what programs and learning paths would be the most useful for you to follow. A breakdown like this could be a useful tool for a business to help them organize their training program.

Ahmed Ayman
Ahmed Ayman 20 Mar, 2017

is there any already created Study Group?

Quoc Anh Trinh
Quoc Anh Trinh 20 Mar, 2017

Thank you very much for this article. However, do you have a mistake when putting the link on the "Feature Extraction/Engineering" part of the beginners' path. It the same book of the previous link and not a blog as you said.

Veena
Veena 28 Mar, 2017

Hello NSS, Thanks for this wonderful plan,however I have some query .As you have mentioned machine learning by Andre Ng, he recommends using Octave.But we are learning python/R ,so how should we follow up on his classes?

Pravin
Pravin 13 Apr, 2017

NSS, Your plan helped me a lot to get a direction on building my Analytics quotient. I find the Probability course from edx is very theoretical and of course I am struggling with the course :(. If you have any alternate equally effective Probability course in mind, do share your recommendation. Thanks in advance.

Jingmiao
Jingmiao 19 Apr, 2017

What a great work for us new beginner! Thanks so much for your article!

Veena
Veena 22 Apr, 2017

Hi NSS, you have mentioned probability class from EDX to finish in 1 week,It would be great help if you can guide me how we can finish it in 1 week in most efficient way.I have started that class and was only able to cover half in 1 week. Regards, Aradhana

Shridevi
Shridevi 25 Apr, 2017

Thanks a lot for the information. its highly beneficial.

aseem
aseem 05 May, 2017

why isn't hadoop listed here sir? shouldn't it be mentioned. your views please ?

Dania Khan
Dania Khan 05 May, 2017

This is great and helpful information. Thank you for laying out these learning plans. I'm hoping to follow these plans and transition into data science after having taken time off from working in database development and having a baby. Could you please advise on what a recommended computer/laptop specifications would be to learn data science? I would like to make sure I have adequate computing power. Thank you!

Hitesh
Hitesh 26 Jun, 2017

Hello sir, i had a query. The introductory course of R which you have mentioned, after that do we need to do intermediate R course of datacamp?

Sambit Pattanaik
Sambit Pattanaik 02 Jul, 2017

Very good article. I am interested

Parth
Parth 23 Jul, 2017

Full fledged Plan. Great Plan. Nice Paths with complete details of actual resources for each concept. Huge Thanks. Still basic doubt floating in my mind and made me to write this comment. Please help. For beginner, to get job/internship, Is not required to have big data skills, deep learning skills and cloud computing? Should I go without all this? Will it work or I need to follow intermediate path also after beginner one?

Prof. S Kotrappa
Prof. S Kotrappa 23 Jul, 2017

Data Science Ecosystem is holistic and complex , It needs a lot of focus, discipline and commitment to do and achievement along with our routine professional and life cores . I am doing bits-and- pieces since 2 yrs (not disciplined way ), its over whelming and huge growth and career path is remarkable for those who do it hard way. Thanks for Analytics Vidhya for helping 100-1000 students who dream and can make their BIG career in their life and in Data Science

Muskan
Muskan 27 Jul, 2017

This article is by far the best i came across.I was really finding a direction to go for in data science but possibly could not find one unless i visited this website.I am really thankful to you for this. :) :)

Raj
Raj 29 Jul, 2017

Wow, this is easily the most comprehensive learning plan available. The biggest challenge is now to prioritize this list with other tasks and start learning. Any tips from others who have successfully covered this kind of learning plan on their own would also be helpful.

amar
amar 31 Jul, 2017

Thank you! This is something what i am looking for.As i am a beginner i will definitely refer this.

Anil Sharma
Anil Sharma 05 Aug, 2017

This is the best article I have read till now on Data Science. I am going to start learning Data Science according to this schedule. Thank you!

amin alakhras
amin alakhras 14 Aug, 2017

guys, the link for (Exploratory Data Analysis) leads to( A Complete Tutorial to Learn Data Science with Python from Scratch ) so here is the coursera link (Exploratory Data Analysis) https://www.coursera.org/learn/exploratory-data-analysis#

SumTuck
SumTuck 28 Aug, 2017

I just want to send my sincerest gratitude to you for taking the time to put this together. Incredibly helpful and provides a clear path to learning data science independently.

Geetha
Geetha 28 Aug, 2017

The learning path seems to be relly good.But I have a confusion in learning the initial topics of Mathematics ans Statistics.There are severel online courses in Udacity,edX etc.. that are specifically titles as Statistics for R,Data Science. Which course should I follow??Either topics in the given learning path or that one... :( I wish to start my learning according to the path mentioned as early as possible. Help me get rid out of these confusion.

Paramjit Singh
Paramjit Singh 09 Sep, 2017

For Linear algebra, I would recommend going through following youtube playlist as well. It's a very small series explaining linear algebra in graphical way. https://www.youtube.com/watch?v=kjBOesZCoqc&list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab

smitha.sss
smitha.sss 11 Sep, 2017

Hi, i am software QA for past 4 years and i am interested to learn data science. Is this transition possible from QA to data science. I pursued masters from Electronics. Please suggest

Madan
Madan 14 Sep, 2017

Hi kunal&Team, This is madan here.I am starting with the same plan now (september 14).So some of my questions are 1.If there is some other plan next year would it be too much of a difference? 2.Would it be even more updated? 3.Would I be able to transist smoothly to the 2018 plan? 4.If so when will the plan be Released?

MustafaDigno
MustafaDigno 02 Oct, 2017

what if I started this comprehensive plan lately??? does the next year plan will vary greatly from this plan

BECHIN
BECHIN 19 Oct, 2017

so great study plan! and now did you got it? I also want to study machine learning from now. thanks a lot your plan

Rakesh Aggarwal
Rakesh Aggarwal 05 Nov, 2017

hi sir I was reading this amazing article posted on your website as i have knowledge of python programming and regular expression but not with the data science and i know it would be pretty easy for me to learn that libraries and i was doing the same by following your systematic approach but when i come to know about your carrier page (Get Hired ) there were only job for the experienced guys now how would i get this experience(industry experience) and who would give me job as a fresher in data science

Subham
Subham 07 Nov, 2017

Hello, I am a BCA graduate and currently doing MCA. I am interested in Bigdata and Cloud. Plz tell me which course will be good for me. Thanks.

Divyajyoti
Divyajyoti 07 Nov, 2017

Hi NSS, let me begin by commending your efforts on the article. Though being a transitioner there was a little bad news for me as you also mentioned. I have a 10 years of C++ industry experience, but i am really eager to upskill my resume and I am greatly attracted with the kind of work folio a data scientist has. Do you think this transition at this point in my career would be a right move ? I am a little confused , which i think is okay to be at some time .. Looking forward to a motivational answer :-) Thanks

Ananya
Ananya 16 Nov, 2017

Is it required to add up SQL along with the above path??

Salma Magdy
Salma Magdy 19 Nov, 2017

I loved the article so so so so much it is very helpful , thank you

Pat Ballard
Pat Ballard 27 Nov, 2017

Was thinking of starting this but it's so close to 2018 is there another plan coming? Thanks!

Pankaj
Pankaj 01 Dec, 2017

Hi, Very useful post indeed, can we have something on AI with similar format. Thanks a lot for preparing and posting such useful content

Yulia
Yulia 01 Dec, 2017

Amazing resource, shame I have only just found it! Since 2018 is just around the corner, when may we expect an updated version?

jaswa
jaswa 06 Dec, 2017

hi, when will the new learning path wil come

S. F.
S. F. 11 Dec, 2017

As a self-identified transition/intermediate leaner, I have been trying to find a path of learning more than a year :( frustrated and overwhelmed by the sea of tutorials, blog posts online. Yet none of them solve my fundamental problems, especially I am working full time as Product Manager in enterprise software domain. Now, ,I have found the map/directions and I know I just need to follow the path and get to my destination :) Thank you very much! And please post the 2018 version soon! Best Siyun

Ali
Ali 11 Dec, 2017

Hi, NSS! Thanks a lot for this indeed comprehensive plan. I got stuck in Probability Course from edX because it seems to me too rigorious with so many topics. Is it really worth to get through all of the topics or better to focus only on fundamental ones? I'm absolutely newbie in Data science, could you explain please how is it better to move at faster pace?

Giorgi
Giorgi 12 Dec, 2017

Hi, This plan is really incredible and elaborate! Thanks a lot!!! I plan to follow this from 2018. Do u have any updates for 2018? Cheers from Georgia!

Lidiya
Lidiya 12 Dec, 2017

Hi! Are you planning to post a learning path for 2018 ? Thank you a lot!

Kevin Burke
Kevin Burke 18 Dec, 2017

This is exactly what I've been looking for, thank you for that. I have modified the plan for 2018 and will use that, Wondering if/when the 2018 plan will surface? (in case there is any essential new learning to be had). thanks

Owusu
Owusu 19 Dec, 2017

I am patiently waiting for 2018's version so I can start from day one! This is indeed comprehensive. Great work.

Ashish
Ashish 20 Dec, 2017

Are you planning to post a similar path for 2018?

Anurag Agrawal
Anurag Agrawal 24 Dec, 2017

When can I expect a similar plan for 2018? I am eagerly waiting for it. Or should follow the same plan as Mentioned in the article for 2017?`

GIDEON TORDZRO
GIDEON TORDZRO 26 Dec, 2017

this is the most comprehensive learning plan. is there any exam and certificate?

Satya
Satya 27 Dec, 2017

Great. Very comprehensive learning plan. Trying to follow. Any suggestions from seniors who already implemented this plan to make it most usable timeline, Thanks NSS.

Nico
Nico 03 Jan, 2018

Hello! This route is very good. Will there be an update for 2018?

Amanda
Amanda 04 Jan, 2018

Hi NSS, This is the most awesome learning plan I can find so far. I am thinking to follow plan. Just wonder if you'll publish a version for 2018?

Samuel
Samuel 06 Jan, 2018

Found this to be very Useful. Thank NSS. But is already 2018, can you post the 2018 version for those of us who wants to start this year.

Annie
Annie 09 Jan, 2018

Can you publish the learning plan for 2018 pls?

Annie
Annie 09 Jan, 2018

Can you please publish the learning plan for 2018?

Gopinath
Gopinath 10 Jan, 2018

Hello NSS/Kunal, I just got the chance to view this wonderful detailed article. Can I follow the same steps from now onwards or Do you have any learning plan for Data science in 2018..?

Andrew
Andrew 13 Jan, 2018

This is great, thank you. Are you still planning to publish a study plan for 2018? Eager to get started but may wait for the 2018 plan if it's on the way.

ahmed
ahmed 14 Jan, 2018

waiting for 2018 plan when it's going to be published??

Josh
Josh 15 Jan, 2018

Hi, NSS. This is a valuable guide, thank you! Sadly, I've just now found it - in 2018. When can we expect the new 2018 guide to come out?

Vivek Desai
Vivek Desai 15 Jan, 2018

Will there be an update in 2018 ?

Saurabh Srivastav
Saurabh Srivastav 18 Jan, 2018

Hi NSS, You are awesome and Thank you so much for this article. I am pursuing Data Scientist course from Jigsaw Academy starting from this January 2018. Can I follow same learning path for this year also? I am in transitioners group and Having 5+ Year of experience. Please let me know Thanks and Regards Saurabh

Data Science Training In Hyderabad
Data Science Training In Hyderabad 19 Jan, 2018

Hi, Thanks for sharing such a great article with us on Data Science we are glad to read this Data Science information

Data Science Training In Hyderabad
Data Science Training In Hyderabad 19 Jan, 2018

Hi, Thanks for sharing such a great article with us on Data Science we are glad to read this information on Data Science

ahmed
ahmed 22 Jan, 2018

It's awesome when 2018 plan will be published ??

Vivek Desai
Vivek Desai 23 Jan, 2018

Great article , when can we expect an update for 2028 ?

Tini
Tini 23 Jan, 2018

Hello, I want to ask about beginner's path-probability. It was said that we need to learn about this topic from edX. When I go to edX courses about probability,it has 10 units in that course and it takes almost three month to learn all the unit. But in the learning path for beginner the time for learning about probabilty is only 2 weeks. Do we have to cover the 10-unit course about probability in edX or maybe we just need to learn unit 1?

Dipesh Parwani
Dipesh Parwani 02 Feb, 2018

This article is the most germane article I've ever come across and it meticulously integrated with all the elements. Thank you very very much.

Joyce Borba
Joyce Borba 05 Feb, 2018

Hi, thank you for this, that´s exatly what I was looking for! I am a beginner, and I have learned a little of python. My doubt is how much programing is required in Data Science? How deeply should i dive into this? Thank you.

Dipesh Parwani
Dipesh Parwani 12 Feb, 2018

Hi, The course on probability is 18 weeks long and specifies you to take 12 hours in a week.Can you please suggest a sub.Thanks.

Amit
Amit 10 Mar, 2018

How about dynamic dates in the article!!!

ohhanibaek
ohhanibaek 16 Mar, 2018

Nice post Thanks for charing huge information on Data Science

Emilia
Emilia 30 Mar, 2018

Wow! That's some great informations you've brought there together and in my new position I'm going to need it. I just wanted to thank you for the time and effort you've put into this and wanted to tell you that I'm going to put it to good use in our company :)

Bharath
Bharath 15 May, 2018

Hi NSS/AV team, A very profound article for all sort of data science venturers. it contains all the necessary things to be considered for growth in data science domain. thanks for sharing such information and keep up the good work.

Pavan Joshi
Pavan Joshi 16 May, 2018

Hi, thank you tons for this road map, being in industry for past 10 years always I had a vision of big data, couldn't achieve so far. This post has enhanced my capabilities to improve in. It would be great pleasure if some one can help me to understand with references and guidance.

Adam
Adam 29 Jun, 2023

Overall, I thoroughly enjoyed your article and found it highly informative. Thanks for sharing.

Related Courses