Learn everything about Analytics

MyStory: Step by Step process of How I Became a Machine Learning Expert in 10 Months

Introduction

Not so long ago, using the pivot tables option in Excel was the upper limit of my skills with numbers and the word python was more likely to make me think about a dense jungle or a nature program on TV than a tool to generate business insights and create complex solutions.

It took me ten months to leave that life behind and start feeling like I belonged to the exclusive world of people who can tell their medians from their means, their x-bars from the neighborhood pub, and who know how to teach machines what they need to learn.

The transformation process was not easy and demanded hard work, lots of time, dedication and required plenty of help along the way. It also involved well over hundreds of hours of “studying” in different forms and an equal amount of time practicing and applying all that was being learnt. In short, it wasn’t easy to transform from being data dumb to a data nerd, but I managed to do so while going through a terribly busy work schedule as well as being a dad to a one-year old.

The point of this article is to help you if you are looking to make a similar transformation but do not know where to start and how to proceed from one step to the next. If you are interested in finding out, read on to get an idea about the topics you need to cover and also develop an understanding of the level of expertise you need to build at each stage of the learning process.

There are plenty of great online and offline resources to help you master each of these steps, but very often, the trouble for the uninitiated can be in figuring out where to start and where to finish. I hope spending the next ten to fifteen minutes going through this article will help solve that problem for you.

And finally, before proceeding any further, I would like to point out that I had a lot of help in making this transformation. Right at the end of the article, I will reveal how I managed to squeeze in so much learning and work in a matter of ten months. But that’s for later.

For now, I want to give you more details about the nine steps that I had to go through in my transformation process.

Step 1: Understand the basics

Spend a couple of weeks enhancing your “general knowledge” about the field of data science and machine learning. You may already have ideas and some sort of understanding about what the field is, but if you want to become an expert, you need to understand the finer details to a point where you can explain it in simple terms to just about anyone.

Suggested topics:   

  • What is Analytics? 
  • What is Data Science? 
  • What is Big Data? 
  • What is Machine Learning? 
  • What is Artificial Intelligence? 
  • How are the above domains different from each other and related to each other?
  • How are all of the above domains being applied in the real world? 

Exercise to show that you know:

  • Write a blog post telling readers how to answer these questions if asked in an interview

 

Step 2: Learn some Statistics

I have a confession to make. Even though I feel like a machine learning expert, I do not feel that I have any level of expertise in statistics. Which should be good news for people who struggle with concepts in statistics as much as I do, as it proves that you can be a data scientist without being a statistician. Having said that, you cannot ignore statistical concepts – not in machine learning and data science!

So what you need to do is to understand certain concepts and know when they may be applied or used. If you can also completely understand the theory behind these concepts, give yourself a few good pats on your back.

Suggested topics:   

  • Data structures, variables and summaries 
  • Sampling
  • The basic principles of probability 
  • Distributions of random variables 
  • Inference for numerical and categorical data 
  • Linear, multiple and logistic regression

Suggested exercise to mark completion of this step:

  • Create a list of references with the easiest to understand explanation that you found for each topic and publish them in a blog. Add a list of statistics related questions that one may be expected to answer in a data science interview

 

Step 3: Learn Python or R (or both) for data analysis

Programming turned out to be easier to learn, more fun and more rewarding in terms of the things it made possible, than I had ever imagined. While mastering a programming language could be an eternal quest, at this stage, you need to get familiar with the process of learning a language and that is not too difficult.

Both Python and R are very popular and mastering one can make it quite easy to learn the other. I started with R and have slowly started using Python for doing similar tasks as well.

Suggested topics:

  • Supported data structures 
  • Read, import or export data 
  • Data quality analysis 
  • Data cleaning and preparation 
  • Data manipulation – e.g. sorting, filtering, aggregating and other functions 
  • Data vizualization

Know that you are set for the next step:

  • Extract a table from a website, modify it to compute new variables, and create graphs summarizing the data

 

Step 4: Complete an Exploratory Data Analysis Project

In the first cricket test match ever played (see scorecard), Australian Charles Bannerman scored 67.35% (165 out of 245) of his team’s total score, in the very first innings of cricket’s history. This remains a record in cricket at the time of writing, for the highest share of the total score by a batsman in an innings of a test match.

What makes the innings even more remarkable is that the other 43 innings in that test match had an average of only 10.8 runs an innings, with only about 40% of all batsmen registering a score of ten or more runs. In fact, the second highest score by an Australian in the match was 20 runs. Given that Australia won the match by 45 runs, we can say with conviction that Bannerman’s innings was the most important contributor to Australia’s win.

Just like we were able to build this story from the scorecard of the test match, exploratory data analysis is about studying data to understand the story that is hidden beneath it, and then sharing the story with everyone.

Personally, I find this phase of a data project the most interesting, which is a good thing as quite a lot of the time in a typical project could be expected to be taken up by exploratory data analysis.

Topics to cover:

  • Single variable explorations 
  • Pair-wise and multi-variable explorations 
  • Vizualization, dashboard and storytelling in Tableau

Project output: 

  • Create a blog post summarizing the exercise and sharing the dashboard or story. Use a dataset with at least ten columns and a few thousand records

 

Step 5: Create unsupervised learning models

Let’s say we had data for all the countries in the world across many parameters ranging from population, to income, to health, to major industries and more. Now suppose we wanted to find out which countries are similar to each other across all these parameters. How do we go about doing this, when we have to compare each country with all the others, across over 50 different parameters?

That is where unsupervised machine learning algorithms come in. This is not the time to bore you with details about what these are all about, but the good news is that once you reach this stage, you have moved on into the world of machine learning and are already in elite company.

Topics to cover:

  • K-means clustering
  • Association rules

Milestone exercise:

  • Practice K-means clustering on 3 different datasets from different industries or interest areas

 

Step 6: Create supervised learning models

If you had data about millions of loan applicants and their repayment history from the past, could you identify an applicant who is likely to default on payments, even before the loan is approved?

Given enough prior data, could you predict which users are more likely to respond to a digital advertising campaign? Could you identify if someone is more likely to develop a certain disease later in their life based on their current lifestyle and habits?

Supervised learning algorithms help solve all these problems and a lot more. While there are a plethora of algorithms to understand and master, just getting started with some of the most popular ones will open up a world of new possibilities for you and the ways in which you can make data useful for an organization.

Topics to cover:

  • Logistic regression 
  • Classification trees 
  • Ensemble models like Bagging and Random Forest 
  • Supervised Vector Machines

You have not really started with creating models till you have done this:

  • Take a dataset, create models using all the algorithms you have learnt. Train, test and tune each model to improve performance. Compare them to identify which is the best model and document why you think it is so

 

Step 7: Understand Big Data Technologies

Many of the machine learning models in use today have been around for decades. The reason why these algorithms are only finding applications now, is that we finally have access to sufficiently large amounts of data, that can be supplied to these algorithms for them to be able to come up with useful outputs.

Data engineering and architecture is a field of specialization in itself, but every machine learning expert must know how to deal with big data systems, irrespective of their specialization within the industry.

Understanding how large amounts of data can be stored, accessed and processed efficiently is important to being able to create solutions that can be implemented in practice and are not just theoretical exercises.

I had approached this step with a real lack of conviction, but as I soon found out, it was driven more by the fear of the unknown in the form of Linux interfaces than any real complexity in finding my way around a Hadoop system.    

Topics to cover:

  • Big data overview and eco-system
  • Hadoop – HDFS, MapReduce, Pig and Hive 
  • Spark

Do this to know that you have understood the basics:

  • Upload data, run processes and extract results after installing a local version of Hadoop or Spark on your system

 

Step 8: Explore Deep Learning Models

Deep learning models are helping companies like Apple and Google create solutions like Siri or the Google Assistant. They are helping global giants test driverless cars and suggesting best courses of treatment to doctors.

Machines are able to see, listen, read, write and speak thanks to deep learning models that are going to transform the world in many ways, including significantly changing the skills required for people to be useful to organizations.

Getting started with creating a model that can tell the image of a flower from a fruit may not immediately help you start building your own driverless car, but it will certainly help you start seeing the path to getting there.

Topics to cover:

  • Artificial Neural Networks 
  • Natural Language Processing 
  • Convolutional Neural Networks 
  • TensorFlow
  • Open CV

Milestone exercise:

  • Create a model that can correctly identify pictures of two of your friends or family members

 

Step 9. Undertake and Complete a Data Project

By now you are almost ready to unleash yourself to the world as a machine learning pro, but you need to showcase all that you have learnt before anyone else will be willing to agree with you.  

The internet presents glorious opportunities to find such projects. If you have been diligent about the previous eight steps, chances are that you would already know how to find a project that will excite you, be useful to someone, as well as help demonstrate your knowledge and skills.

Topics to cover:

  • Data collection, quality check, cleaning and preparation 
  • Exploratory data analysis 
  • Model creation and selection 
  • Project report

Milestone exercise:

  • Get in touch with a stakeholder who will be interested in your report and share your findings with them and get feedback

 

End Notes

Machine learning and artificial intelligence is a set of skills for the present and future. It is also a field where learning will never cease and very often you may have to keep running to stay in the same place, as far as being equipped with the most in-demand skills is concerned.

However, if you start the journey well, you will be able to understand how to go about taking the next step in your learning path. As you must have gathered by now, starting the journey well is a pretty challenging exercise in itself. If you choose to start upon it, I hope this article will have been of some help to you and I wish you the very best.

Finally, I will confess that I got a lot of help with my ten-month transition. The reason I was able to cover so much ground in this amount of time, along with a busy schedule at work and home, was that I enrolled for the Post Graduate Program in Data Science and Machine Learning offered by Jigsaw Academy and Graham School, University of Chicago.

Investing in the course helped in keeping my learning hours focused, created external pressure that ensured that I was finding time for it irrespective of whatever else was going on in life, and gave me access to experts in the form of faculty and a great peer group through other students.

Transforming from being non-technical to someone who is comfortable with the machine learning world has already opened up many new doors for me. Whatever path you choose to make this transformation, you can do so with the assurance that going through the rigor will reap rewards for a long time and will banish any fears of becoming irrelevant in tomorrow’s economy.

 

About the Author

Madhukar Jha, Founder – Blue Footed Ideas

Madhukar Jha believes that great digital experiences are created by concocting a perfect mix of data driven insights, understanding of behavioural drivers, a design thinking approach, and cutting edge technology. He applies this philosophy to help businesses make world class products, run campaigns that rock and tell compelling stories.

You can also read this article on Analytics Vidhya's Android APP Get it on Google Play

8 Comments

  • Valmik says:

    These r future ready courses , having strength to face the challenges from ever growing technological world

    I want to prepare myself for the same.

  • Sam Nathan says:

    Awe inspiring & excellent article Madhukar Jha! My 2 cents:

    1. Interchanging steps 5 & 6 ( I found it easier to begin with Supervised & migrate to unsupervised)
    2. To supplement topics to cover you may have included some tips around “Learning Materials”

    You have done an awesome job by posting this which could motivate many to take up this profession seriously. Kudos & keep inspiring!

  • Paari Elangovan says:

    The fact that you went through this journey and the way you articulated makes this article awesome!
    Thanks Madhukar for this wonderful blueprint.

  • Ravi says:

    Very helpful Post! Thank you! Made it possible for me to visualise the path and take the first step.

  • Kumar Teja says:

    This was a motivational read for a person who’s just getting into data science such as myself. But I personally feel this article would have even more of a impact if you could provide a timeline for the 10 months in which you went from a novice to Data science expert

  • Sudhir Kumar says:

    Thank you very much.it is such a very good article .

  • Steven William says:

    Really informative, when I browsed Data Sciences and Machine learning python course. I found Simple Analytics Inc. pursuing a data science course online will be an ideal deal for your career.

  • Sandy says:

    Really nice article for them who dont know from where to start.