Why most data science trainings fail to deliver? How to overcome these failures?
At the outset, it looks like there is no dearth of data science / analytics trainings available today. Our training listing page probably has more than 300 trainings listed and they come in all shape and size: short term / long term, tool specific courses, self paced / instructor led trainings, Online / offline and the list goes on!
But, how many of them enable you for a real life analytics career? Probably, only a handful of them! I think that there are some major flaws in the way data science trainings are being designed and delivered currently. Until we correct them, it is very difficult to create meaningful impact in careers of people undergoing these trainings.
A good analogy to drive the point is to look at offerings of Apple vs. others. If you compare the products by writing down specs, you would not understand what Apple is delivering against its competitors. Similarly on the trainings comparison, whether a training is self paced vs. instructor led, delivered online or offline, which tool is it teaching are just specifications. You need to look at how they prepare you for an analytics career!
This is where a lot of trainings would fall through the cracks. In this article, I will bring out some of the common short-comings of various analytics programs and leave you with thoughts on how to overcome these short-comings.
P.S. My aim is not to point of the flaws in the eco-system. I too am part of it. The idea is to make sure people undergoing these trainings are making the right choices and decisions.
Limitations of the data science trainings & how to overcome them:
1. Limited / no attention to structured thinking:
Most of the trainings I know of, don’t emphasize the need of structured thinking enough. They assume that people from various backgrounds would be able to take amorphous business problems and put a data science framework around these problems to solve them.
On the other hand, most of the good analytics companies would train people on structured thinking as part of their induction.
What is the reason for this gap to exist? I think part of it is down to the fact that the need and expertise of structured thinking are difficult to quantify currently. It is something which can not be communicated in form of certifications. But, the people undergoing these trainings would feel the heat, the minute they face an interview for data science positions.
How to emphasize / learn structured thinking:
You can start by reading these articles:
- The importance of structured thinking
- Tools for structured thinking
- How to train your mind on analytical thinking?
The best way to cover for these shortcomings is to practice structured thinking and practice it in day to day activities. The more structured your thoughts, the easier it is to solve business problems.
2. Focus on tools instead of learning fundamentals of the subject:
Stakeholders in the training ecosystem need to understand that learning data science is different from learning its tools. A certification in SAS or R does not prepare you for solving real life problems. On the other hand, if you understand the fundamentals of the subject and can put frameworks in place, you can always learn and apply the tools very easily.
What do I mean by fundamentals of data science? By fundamentals, I meant that the person undergoing training understands what are we trying to do with regressions, their underlying assumptions and their shortcomings.
The best way to solve for this shortcoming is to be curious. If you don’t understand something, just ASK! You can ask your mentors / instructors or even use our discussion platform.
3. Not enough emphasis on feature engineering / data cleaning:
Most of the training would provide you small datasets to play around and apply your programming techniques. Learning clustering? Take the standard IRIS dataset. Learning time series? Take the airline passenger data.
While these toy datasets are good to get the hang of concepts, they fail to provide understanding of real life challenges. They fail to make the trainee understand the importance of hypothesis building and spending time cleaning your data.
How to solve for this?
Kaggle is probably the best source to learn the importance of feature engineering. Start from the Titanic competition and move upwards on the level of complexity! Participate in as many competitions as possible (but one at a time) and see what other data scientists are doing by following the forums.
4. Trainings don’t prepare you for real life implementation problems:
How many people passing out of these trainings / courses would appreciate the difference in approach required to implement a data science solution in a manufacturing setup vs. e-Commerce setup vs. BFSI companies? Not many. While a lot of this might come from experience, including them as part of curriculum definitely makes a better data science professional.
Again, the easiest solution to this is by being curious. Ask your mentors and instructors, read out case studies, network with people in industry – talk to them about their challenges and learnings. The least you should do is make the most of your instructor’s experience.
Internships can also turn out to be a useful way to get a real hang of these problems. If you can get such an internship, nothing like it!
Choosing trainings by just comparing on a few features can lead you to wrong outcomes. Hopefully, you would see what I am trying to tell. Ask these questions while selecting the trainings. Chances are that you won’t find a perfect training out there. But, you will be aware of these short-comings and can then cover up these short-comings on your own.
Hope you find these questions useful. If you are facing challenges in figuring out the right data science training, feel free to reach out to us. We will love to help you in any manner we can.