Machine learning is a complex topic to master!
Not only there is a plethora of resources available, they also age very fast. Couple this with a lot of technical jargon and you can see why people get lost while pursuing machine learning. However, this is only part of the story. You can not master machine learning with out undergoing the grind yourself. You have to spend hours understanding the nuances of feature engineering, its importance and the impact it can have on your models.
Through this learning path, we hope to provide you an answer to this problem. We have deliberately loaded this learning path with a lot of practical projects. You can not master machine learning with the hard work! But once you do, you are one of the highly sought after people around.
Since this is a complex topic, we recommend you to strictly follow the steps in sequential order. Consider this as your mentor for machine learning. Only skip a step, if you know the subject matter mentioned in that step already.
Warming up – how is machine learning useful?
If you are a complete starter to machine learning, here is a good talk from Jeremy Howard to understand how machine learning is changing this world. Jeremy discusses various applications of machine learning and deep learning. Jeremy, also discusses a few ways in which machine learning can impact this world.
Still not sure, check out this smaller video on training a machine to play Super Mario.
Excited about what machine learning can achieve? Let’s look at a learning path to make you a machine learning expert.
Optional read: Basics of Machine learning for a newbie
Step 0: Basics of R / Python
There are multiple languages which provide machine learning capabilities. Also, there is development work happening at a rapid pace across several languages. Currently “R” and “Python” are the most commonly used languages and there is enough support / community available for both. Before entering into world of ML, I would recommend you to choose one of these two language (R or Python) which can help to focus on machine learning (Which is better – R or Python?).
Keep your focus on understanding the basics of the language, libraries and data structure. Here’s the step by step guide to learn R and Python:
a) Learning Path on R: Step 0 to Step 2
b) Learning Path on Python: Step 0 to Step 2
Other languages you can consider: Scala, Go / Julia in coming time
Step 1: Learn basic Descriptive and Inferential Statistics
Let’s start or refresh our statistical learning. It is good to have understanding about the descriptive and inferential statistics before you start serious machine learning development. Udacity offers course on descriptive statistics and Inferential statistics. Both courses would make use of Excel to teach you all the basics of statistics. If you already know them, you can refresh or skip this step.
Assignment: You can perform assignments of both the courses using your choice of language (R / Python). You can refer respective statistical libraries and methods for both the languages below.
Step 2: Data Exploration / Cleaning / Preparation
What differentiates a good machine learning professional from an average one is the quality of feature engineering and data cleaning which happens on the original data. The more quality time you spend here, the better it is. This step also takes the bulk of your time and hence it helps to put a structure around it. You can refer series of articles below to learn different stages of data explorations.
- Variable Identification, Univariate and Multivariate analysis
- Missing values treatment
- Outlier treatment
- Feature Engineering
You can also refer Data exploration methods in R and Python:
Exercise / assignment:
- Take up the titanic survival problem from Kaggle, build a set of hypothesis and then clean the data, add new features to the existing dataset. Think what is the best way to impute missing age?
- Similarly, take up the Bike sharing demand forecasting problem and repeat the cycle mentioned above.
Step 3: Introduction to Machine Learning
You should now open the doors for Machine Learning. There are various resources available to start with Machine learning techniques. I would suggest you to pick one of the following 2 ways depending on your style of learning:
- Option 1: If you are some one who likes to take learning in small small steps and need more hand holding, you should start from Machine learning course from Andrew Ng: It is a good course for beginners and easy to understand. Professor Ng is amazing in making difficult concepts come to you so smoothly. The course covers all the basic algorithms and also introduces a few advanced topics like neural networks, Recommendation system and application of machine learning in large databases using Map Reduce. He chooses to use Octave / MATLAB instead of the more popular R or Python for teaching machine learning. Once completed, you should proceed to exercises and homework provided in Option 2.
- Option 2: If you are more independent, like challenges and can battle out tough assignments, you should take Learning form Data course by Prof. Yaser Abu-Mostafa: This course gives an amazing treatment of the concepts behind machine learning but beware this course is quite heavy on math and the theory behind ML (stuff like the VC dimension). It also requires more programming knowledge and is thus more advanced in that sense. This course is loaded with home works (which is not necessarily a bad thing ).
Now, you have good understanding about the algorithms and techniques. Let’s look at the libraries or packages available in R or Python. You can refer learning path (step-6 ) of R (additionally, ML Algorithms in R) and Python to explore about these packages and related options.
Step 4: Participate in Kaggle Knowledge competition
By now, you have all the tools you need to compete on Kaggle knowledge competitions. These knowledge competitions have less difficulty level as compared to prize winning challenges. You can also find various related resources to kick start your data science journey. Below are the list of currently active knowledge competition:
Must Read: How do I start my journey on Kaggle?
Step 5: Advanced Machine Learning
Now that you have learnt most of machine learning techniques, it’s time to explore advanced machine learning techniques to understand different structure of data like Deep Learning and Machine Learning with Big Data.
Are you aware about deep learning? if not, here is a brief introduction about it and more detail on deep learning watch video here. Below are the list of deep learning resources that will help you to get started:
- The most comprehensive resource is deeplearning.net. You will find everything here – lectures, datasets, challenges, tutorials.
- Another course from Geoff Hinton a try in a bid to understand the basics of Neural Networks
- Pattern recognition using Python (Resource 1, Resource 2, Resource 3) and R (Resource 1)
- Text Mining using Python (Resource) and R (Resource 1 , Resource 2)
This is where an expert is different from an average professional. Ensembling can add a lot of power to your models and has been a very successful technique in various Kaggle competitions. Here is one of the best guide on emsemble modeling we have come across.
Machine Learning with Big Data
As you know that the size of data is increasing at an exponential rate but raw data is not useful till you start getting insights from it. Machine learning is nothing but learning from data, generate insight or identifying pattern in the available data set. There are various application of machine learning algorithms like “spam detection”, “web document classification”, “fraud detection”, “recommendation system” and many others. Below are the list of tutorials to deal with big data using machine learning.
- Scalable Machine Learning
- Packages for Big Data in Python ( Pydoop, PyMongo) and R (Resource1, Resource2)
Step 6: Participate in main stream Kaggle Competition
Now you have most of the technical and statistical skills. It’s time to start learning from fellow data scientists while competing with them. Kaggle is a similar place as what we want a more active, engaged and competitive platform. Data scientists are passionate about their rank and model performance. Go, dive into one of the live competitions currently running on Kaggle and give all what you have learnt a try! Good luck!
Optional step: Text mining and databases
If you need to apply machine learning to text mining, you can look at the following guide to clean text data and build models on it. You can also look at the following Kaggle competition:
The Fun part
Now that you know what and where to learn to become a machine learning professional, here is a small simulation of how a genetic algorithm based robot would learn walking
And some serious stuff
Now that you know the potential of machine learning, imagine the impact it could have on today’s world. The talk from Jeremy mentions briefly about this. Following article tells about this evolution from a different perspective: part 1 & part 2
Hope you enjoyed this learning path on machine learning and the impact machine learning can have on our future. If you have any suggestions to improve this learning path, please feel free to share them through comments below.