Not so long ago, using the pivot tables option in Excel was the upper limit of my skills with numbers and the word python was more likely to make me think about a dense jungle or a nature program on TV than a tool to generate business insights and create complex solutions.
It took me ten months to leave that life behind and start feeling like I belonged to the exclusive world of people who can tell their medians from their means, their x-bars from the neighborhood pub, and who know how to teach machines what they need to learn.
The transformation process was not easy and demanded hard work, lots of time, dedication and required plenty of help along the way. It also involved well over hundreds of hours of “studying” in different forms and an equal amount of time practicing and applying all that was being learnt. In short, it wasn’t easy to transform from being data dumb to a data nerd, but I managed to do so while going through a terribly busy work schedule as well as being a dad to a one-year old.
The point of this article is to help you if you are looking to make a similar transformation but do not know where to start and how to proceed from one step to the next. If you are interested in finding out, read on to get an idea about the topics you need to cover and also develop an understanding of the level of expertise you need to build at each stage of the learning process.
There are plenty of great online and offline resources to help you master each of these steps, but very often, the trouble for the uninitiated can be in figuring out where to start and where to finish. I hope spending the next ten to fifteen minutes going through this article will help solve that problem for you.
And finally, before proceeding any further, I would like to point out that I had a lot of help in making this transformation. Right at the end of the article, I will reveal how I managed to squeeze in so much learning and work in a matter of ten months. But that’s for later.
For now, I want to give you more details about the nine steps that I had to go through in my transformation process.
Step 1: Understand the basics
Spend a couple of weeks enhancing your “general knowledge” about the field of data science and machine learning. You may already have ideas and some sort of understanding about what the field is, but if you want to become an expert, you need to understand the finer details to a point where you can explain it in simple terms to just about anyone.
- What is Analytics?
- What is Data Science?
- What is Big Data?
- What is Machine Learning?
- What is Artificial Intelligence?
- How are the above domains different from each other and related to each other?
- How are all of the above domains being applied in the real world?
Exercise to show that you know:
- Write a blog post telling readers how to answer these questions if asked in an interview
Step 2: Learn some Statistics
I have a confession to make. Even though I feel like a machine learning expert, I do not feel that I have any level of expertise in statistics. Which should be good news for people who struggle with concepts in statistics as much as I do, as it proves that you can be a data scientist without being a statistician. Having said that, you cannot ignore statistical concepts – not in machine learning and data science!
So what you need to do is to understand certain concepts and know when they may be applied or used. If you can also completely understand the theory behind these concepts, give yourself a few good pats on your back.
- Data structures, variables and summaries
- The basic principles of probability
- Distributions of random variables
- Inference for numerical and categorical data
- Linear, multiple and logistic regression
Suggested exercise to mark completion of this step:
- Create a list of references with the easiest to understand explanation that you found for each topic and publish them in a blog. Add a list of statistics related questions that one may be expected to answer in a data science interview
Step 3: Learn Python or R (or both) for data analysis
Programming turned out to be easier to learn, more fun and more rewarding in terms of the things it made possible, than I had ever imagined. While mastering a programming language could be an eternal quest, at this stage, you need to get familiar with the process of learning a language and that is not too difficult.
Both Python and R are very popular and mastering one can make it quite easy to learn the other. I started with R and have slowly started using Python for doing similar tasks as well.
- Supported data structures
- Read, import or export data
- Data quality analysis
- Data cleaning and preparation
- Data manipulation – e.g. sorting, filtering, aggregating and other functions
- Data vizualization
Know that you are set for the next step:
- Extract a table from a website, modify it to compute new variables, and create graphs summarizing the data
Step 4: Complete an Exploratory Data Analysis Project
In the first cricket test match ever played (see scorecard), Australian Charles Bannerman scored 67.35% (165 out of 245) of his team’s total score, in the very first innings of cricket’s history. This remains a record in cricket at the time of writing, for the highest share of the total score by a batsman in an innings of a test match.
What makes the innings even more remarkable is that the other 43 innings in that test match had an average of only 10.8 runs an innings, with only about 40% of all batsmen registering a score of ten or more runs. In fact, the second highest score by an Australian in the match was 20 runs. Given that Australia won the match by 45 runs, we can say with conviction that Bannerman’s innings was the most important contributor to Australia’s win.
Just like we were able to build this story from the scorecard of the test match, exploratory data analysis is about studying data to understand the story that is hidden beneath it, and then sharing the story with everyone.
Personally, I find this phase of a data project the most interesting, which is a good thing as quite a lot of the time in a typical project could be expected to be taken up by exploratory data analysis.
Topics to cover:
- Single variable explorations
- Pair-wise and multi-variable explorations
- Vizualization, dashboard and storytelling in Tableau
- Create a blog post summarizing the exercise and sharing the dashboard or story. Use a dataset with at least ten columns and a few thousand records
Step 5: Create unsupervised learning models
Let’s say we had data for all the countries in the world across many parameters ranging from population, to income, to health, to major industries and more. Now suppose we wanted to find out which countries are similar to each other across all these parameters. How do we go about doing this, when we have to compare each country with all the others, across over 50 different parameters?
That is where unsupervised machine learning algorithms come in. This is not the time to bore you with details about what these are all about, but the good news is that once you reach this stage, you have moved on into the world of machine learning and are already in elite company.
Topics to cover:
- K-means clustering
- Association rules
- Practice K-means clustering on 3 different datasets from different industries or interest areas
Step 6: Create supervised learning models
If you had data about millions of loan applicants and their repayment history from the past, could you identify an applicant who is likely to default on payments, even before the loan is approved?
Given enough prior data, could you predict which users are more likely to respond to a digital advertising campaign? Could you identify if someone is more likely to develop a certain disease later in their life based on their current lifestyle and habits?
Supervised learning algorithms help solve all these problems and a lot more. While there are a plethora of algorithms to understand and master, just getting started with some of the most popular ones will open up a world of new possibilities for you and the ways in which you can make data useful for an organization.
Topics to cover:
- Logistic regression
- Classification trees
- Ensemble models like Bagging and Random Forest
- Supervised Vector Machines
You have not really started with creating models till you have done this:
- Take a dataset, create models using all the algorithms you have learnt. Train, test and tune each model to improve performance. Compare them to identify which is the best model and document why you think it is so
Step 7: Understand Big Data Technologies
Many of the machine learning models in use today have been around for decades. The reason why these algorithms are only finding applications now, is that we finally have access to sufficiently large amounts of data, that can be supplied to these algorithms for them to be able to come up with useful outputs.
Data engineering and architecture is a field of specialization in itself, but every machine learning expert must know how to deal with big data systems, irrespective of their specialization within the industry.
Understanding how large amounts of data can be stored, accessed and processed efficiently is important to being able to create solutions that can be implemented in practice and are not just theoretical exercises.
I had approached this step with a real lack of conviction, but as I soon found out, it was driven more by the fear of the unknown in the form of Linux interfaces than any real complexity in finding my way around a Hadoop system.
Topics to cover:
- Big data overview and eco-system
- Hadoop – HDFS, MapReduce, Pig and Hive
Do this to know that you have understood the basics:
- Upload data, run processes and extract results after installing a local version of Hadoop or Spark on your system
Step 8: Explore Deep Learning Models
Deep learning models are helping companies like Apple and Google create solutions like Siri or the Google Assistant. They are helping global giants test driverless cars and suggesting best courses of treatment to doctors.
Machines are able to see, listen, read, write and speak thanks to deep learning models that are going to transform the world in many ways, including significantly changing the skills required for people to be useful to organizations.
Getting started with creating a model that can tell the image of a flower from a fruit may not immediately help you start building your own driverless car, but it will certainly help you start seeing the path to getting there.
Topics to cover:
- Artificial Neural Networks
- Natural Language Processing
- Convolutional Neural Networks
- Open CV
- Create a model that can correctly identify pictures of two of your friends or family members
Step 9. Undertake and Complete a Data Project
By now you are almost ready to unleash yourself to the world as a machine learning pro, but you need to showcase all that you have learnt before anyone else will be willing to agree with you.
The internet presents glorious opportunities to find such projects. If you have been diligent about the previous eight steps, chances are that you would already know how to find a project that will excite you, be useful to someone, as well as help demonstrate your knowledge and skills.
Topics to cover:
- Data collection, quality check, cleaning and preparation
- Exploratory data analysis
- Model creation and selection
- Project report
- Get in touch with a stakeholder who will be interested in your report and share your findings with them and get feedback
Machine learning and artificial intelligence is a set of skills for the present and future. It is also a field where learning will never cease and very often you may have to keep running to stay in the same place, as far as being equipped with the most in-demand skills is concerned.
However, if you start the journey well, you will be able to understand how to go about taking the next step in your learning path. As you must have gathered by now, starting the journey well is a pretty challenging exercise in itself. If you choose to start upon it, I hope this article will have been of some help to you and I wish you the very best.
Finally, I will confess that I got a lot of help with my ten-month transition. The reason I was able to cover so much ground in this amount of time, along with a busy schedule at work and home, was that I enrolled for the Post Graduate Program in Data Science and Machine Learning offered by Jigsaw Academy and Graham School, University of Chicago.
Investing in the course helped in keeping my learning hours focused, created external pressure that ensured that I was finding time for it irrespective of whatever else was going on in life, and gave me access to experts in the form of faculty and a great peer group through other students.
Transforming from being non-technical to someone who is comfortable with the machine learning world has already opened up many new doors for me. Whatever path you choose to make this transformation, you can do so with the assurance that going through the rigor will reap rewards for a long time and will banish any fears of becoming irrelevant in tomorrow’s economy.
About the Author
Madhukar Jha, Founder – Blue Footed Ideas
Madhukar Jha believes that great digital experiences are created by concocting a perfect mix of data driven insights, understanding of behavioural drivers, a design thinking approach, and cutting edge technology. He applies this philosophy to help businesses make world class products, run campaigns that rock and tell compelling stories.You can also read this article on Analytics Vidhya's Android APP