The rapid rise of data science as a professional field has lured in people from all backgrounds. Engineers, computer scientists, marketing and finance graduates, analysts, human resource personnel – everyone wants a piece of the data science pie.
Analytics Vidhya has already published a comprehensive learning path for beginners to get into data science. So why am I focusing on professionals working in business intelligence / MIS / reporting specifically? Let me explain.
I regularly encounter talented business intelligence (BI) professionals looking to land their first data science role. They are often frustrated by the perceived lack of opportunities for them. A lot of them feel that their role is repetitive, or they just need to perform whatever has been asked from them.
They actually miss the fact that they are closer to data science opportunities than any other professional out there.
Note: If you’re someone who is looking for a break in the data science domain, look no further than this comprehensive Ascend Pro program that combines data science knowledge with practical hands-on industry-relevant problems.
Business Intelligence (BI) professionals hold a massive advantage over almost anyone trying to transition into data science because of the following reasons:
In other words, these folks work in the “first half” of a data science project. That’s already more industry experience than most aspiring data scientists!
If you are one such transitioner looking to jump from a BI / MIS / reporting role to data science, this article is for you. You can consider these 11 steps as a learning path you can follow. In fact, I would strongly encourage you to implement these steps in your current BI role. Start where you are and practice till you break into data science!
P.S. For the rest of the article – Business Intelligence, MIS, reporting, dashboarding have been used interchangeably. There is very little difference and a lot of overlap in these roles and designations.
So are you ready to take this journey with me? Let’s take this ride step-by-step.
Let’s start off by looking at three examples of reporting that a BI (MIS / Reporting / Business Intelligence) professional does on a day-to-day basis.
This BI professional has generated a report containing details about business sourcing at the City and Region levels along with the quality of the business.
Here, the BI professional has generated the same report with an addition of RAG (Red-Amber-Green) analysis for the “Rejection Score” column. A lower rejection score means a higher quality of business.
In this example, the BI professional has taken things to another level by adding insights about the report. You can see that (s)he has written the top 2 findings taken from the report. I’ve taken a simple example here to add interpretability to your report. You can add more visuals/charts depending on the type of information you are sharing.
Looking at the above three examples, I would side towards “Example 3” adding more value to the business because:
These actually help business users a lot. When you work with senior-level executives, you’ll find that most of them need actionable items to work on. They would not want to spend time focusing on interpreting the report and doing in-depth analysis.
To generate a similar report, a BI professional should have curiosity, attention to detail, command over anyone tool (Excel / SQL / QlikView / Tableau), along with knowledge about the business.
This skillset is not only limited to folks working in BI! It’s also very critical to becoming a good data science professional. In most cases, 60-70% of a data scientist’s job is about business understanding, data exploration, and generating insights about the problem at hand.
A BI professional has a huge advantage here compared to other professionals who are transitioning into data science. You can start practicing TODAY and this skill set will help you do well in your current role as well. It’s a win-win!
It’s now time to support your insights with some statistical metrics. Don’t just limit yourself to generating insights based on visual interpretations. Take a look at the image below – what’s your first reaction?
I can say that the average business sourced, post the contest, is higher as compared to before. Now, the question is whether “contest is the factor behind the boost in average business sourced, or is it just a random increase?”. Here, we need to rely on certain statistics concepts to support our insights, like doing a z-test/t-test or other statistical tests. Having a good knowledge of statistics will help you in these situations.
You should have a solid understanding of the below statistics topics if you want to land a data science role:
And here is a list of useful resources to help you get started with these topics:
Performing detective and statistical analysis will not help you land a data science role if you don’t share your findings with the right group.
Presenting stories is one of the key skills a data science professional must possess.
Here, I strongly recommend practicing this storytelling skill in your current role as well. You can start with the following:
Here’s an essential recommendation that has personally helped me in my career – add visualization(s) to your slide(s). The words you write in the presentation (or speak during a meeting) should add context to your visualizations. Confused? Let me explain using an example.
Look at the visualization below. It showcases details about Sachin Tendulkar’s test match career. You can talk about various metrics here using the graphs and numbers. This also shows why business understanding is so important – you can’t talk about metrics you have no experience with!
You should check out this excellent article – “The Art of Story Telling in Data Science and how to create data stories“.
Your concerns are understandable but you have to start somewhere to gather experience! My suggestion would be to start by sharing insights with your manager, experienced teammates, or your customers (if that’s possible). This will give your confidence a much-needed boost so start practicing!
So far, I’ve not talked about any tool for generating reports and insights. I have deliberately avoided going into questions like – which tool should you pick? Or which one is right or better? That is because my objective was to get you comfortable with detective analysis, statistical concepts and honing your communication skills so you can present your findings using your current working tool.
Now, it’s time to learn a tool that has:
You can pick up any tool among SAS / R / Python as all these tools have the above-listed capabilities. Here, your initial task is very specific while learning a new tool – get yourself comfortable with performing data exploration, visualization, detective analysis, and statistical tests. You don’t need to have complete expertise over any of these tools (not initially, anyway).
If you’re not sure which tool to pick, I would suggest going through this awesome article written by Kunal Jain that compares the various pros and cons of these three tools.
You can look at the below tutorials for learning data exploration using SAS/R or Python:
It’s finally time to move to the most attractive part of data science – model-building! Before you dive into specific models, I recommend first understanding the type of problems that exist. Here is an article explaining the basics of predictive modeling/machine learning – Machine Learning basics for a newbie.
Broadly, we can divide the model building process into 5 steps:
I’m parking the first two steps (Problem Definition and Hypothesis Generation) to cover later in the article. We’ll talk about data exploration in this section.
The data exploration step is similar to detective analysis where our primary objective is to understand the behavior of variable(s) individually and with each other as well. Here, a good knowledge of statistics will help you a lot. This step focuses on both insight generation as well as data cleaning. You could be required to impute missing values, detect and deal with outliers, and perform multiple types of transformation.
I’ve written a comprehensive guide on the steps involved in data exploration. You can practice all these methods on a dataset from your industry or using any open dataset.
During our model building process, we train our model on a dataset where the target is known beforehand and then apply it on the test dataset to predict the target variable. We obviously want to be accurate while estimating the target variable.
How can we check whether we are accurate or not? We need a metric that will help us to evaluate our model result against the actual observations. Let’s understand this using an example.
We have a customer base – C1, C2 and C2. We’ve estimated that only C3 will buy product “A” from this customer base. As it turns out, both C2 and C3 bought the product. This means we’re 66.6% accurate (2 out of 3 predictions are correct). This accuracy is known as our “Evaluation Metric“.
The evaluation metric will change depending on the type of problem you are solving. Here is a list of common evaluation metrics you should be know.
You’ve decided on the evaluation metric but do you have the actual results to evaluate your model? You can’t jump into the future to prepare a test dataset! In this scenario, we reserve a particular sample of the dataset on which we do not train the model. Later, we evaluate the model on this sample before finalizing it. This method is known as model validation. You can refer to this article on various validation techniques which includes practical examples in R and Python.
You have understood the dataset and looked at the metrics to evaluate your model’s performance. What’s next?
Applying modeling techniques! Do not start learning multiple techniques simultaneously. Focus on only two for now – Linear and Logistic Regression. These two techniques will help you predict continuous and categorical variables.
For example:
Below are two good articles to learn linear and logistic regression and practice using a tool of your choice:
So, where can you find a dataset for your domain? Finding a business problem can be difficult.
You should talk to the leadership or team managers and take one of their business challenges as your project. Here, the first step is to convert the business problem to a data problem. Then, start moving down the steps we had discussed in point #5 earlier – hypothesis generation, data collection, data exploration, data cleaning, and finally model building and validation.
One of the major advantages you have as a BI professional is that you are already familiar with the variables in the dataset. Your detective analytics skills will help you understand the variable(s) relationships as well. You can jump to tasks like data cleaning, transformation, identifying the right evaluation metric, setting validation set, and finally model building.
You should take some time and watch the below webinar by Tavish Srivastava to understand the importance of defining the problem statement and hypothesis generation:
I also recommend going through the below articles on building models easily and effectively in R and Python:
After building your model, you should share the results with your supervisor or the people who make decisions (like the team or project manager). As a data science professional, it is very critical to share your findings (like which feature(s) is making an impact on the target variable). You should also communicate regular updates around the comparison between your model result and the actual numbers.
This process will also help you to tune and improve your model. If the model is performing well, then there is a high chance you will get another assignment or get involved with the core data science team. That’s what we are aiming for, right?
Learning never stops in data science. It is an ever-evolving field and we need to keep evolving with it. You’ve learned linear and logistic regression so far – extend your knowledge beyond that now. Learn algorithms like decision trees, random forest, and even neural networks.
And like I mentioned before, you should learn by applying. Theoretical knowledge is good to have but it’s useless if you don’t put it into practice. Pick up the datasets we spoke about earlier and apply these newly learned algorithms. You are likely to see a significant improvement in your model!
Now, let’s take a step outside the tools and techniques. I want to emphasize the importance of building your network and profile in the data science community.
Start attending data science focused events like meetups and conferences. You will meet like-minded people as well as experienced professionals who can guide you. I have seen plenty of aspiring data science professionals acquiring job offers through these events so I an vouch for their usefulness!
You should also focus on the digital aspect of your profile. You have clearly been working with data science projects so showcase your work to the community! Upload your code to GitHub and start publishing blogs/articles on your findings. This helps prospective employers see that you possess good knowledge about the subject.
While there are no easy ways to transition into data science, there are certain well-trodden paths. One of them is switching to the data science team in your current organization. Let me explain why you should focus on this rather than other paths (at least for starters).
I could go on, but you get the idea. Always make it a first preference to look for opportunities in your current work place. Talk to folks in a senior role or from the data science team. Build up your network and trust me, it does pay off eventually.
That was quite an exhilarating journey! I have made this transition myself quite a number of years ago. I have seen this field evolve over time and my aim in this article was to help you make that switch. You already have a number of steps covered that most aspiring data science professionals don’t, so make it count!
If you have any questions on this learning path, or any feedback on this article, let me know in the comments section below. Meanwhile, here are a few additional resources to learn data science and give yourself the best chance of breaking into this field:
Great Article !! Very practical approach ,..
Hi Sunil Sir, I have been following and reading your article since first day when i started my career on Data analytics.Undoubtedly, your every articles are asset for every data scientist.I learned Predictive analytics on R.But in my current company, i am working as Data analyst in business reporting section.,where my responsibilities are to build business report,dashboard so that my managers and directors can take easy business decision from my findings. I generally use, Microsoft Power BI and Excel Power tools for that.I believe that, my reporting career has helped me to extract any meaningful information from raw data and for that, i do manipulate the data, explore the data extremely and this is the key parts for making any predictive model as you said that,"data manipulation and data exploration is the key part of making any predictive model"..I had a fear that, how can i connect my reporting career with data science career? can i jump into data science domain? as i am working in business reporting section.But, your article is really life saving for me.Thanks for sharing this article.Another request, if you kindly share your Email- id for suggestion and advice, then it would be really great for me.
This was an interesting exposition. I like the way you generate the problem from BI thinking to Data Analysis thinking. I am interested in why BI Analysts want to change their role. You mentioned because they feel "undervalued" themselves, but they have a role too.
Great Article. Thank you for the insights.
Wow...for a moment like I was shocked reading your writing, this is what I was looking for. Thanks.
Hi Sunil, I saw your posted article in 2017, Understanding and Coding Neural Networks from Scratch in Python and R - Implementing NN in R. That's a great introduction of NN, very helpful for learning NN.I have two questions (or I should say - requests, sorry). Hope you can answer and help: 1. I copied your R code and it works wonderful, very nice work, a simple and efficient code. I was wondering - is it possible to add or extend it with additional Prediction function, such as trained network, final optimal weights?, saved for further testing, validation and even for prediction (as NN is a powerful AI classification and prediction method/algorithm). For example, R neuralnet package has both "train" and "prediction" functions that can be applied. 2. Is it possible to have SAS version of this code (since SAS is also a widely used application tool)? Again, your great work, kind advice and professional help are greatly appreciated. Thanks, Keh-Dong
Great Article. couldn't find better than this. i was in BI making my move into Data science. Thank you for the much needed source of information.
Awesome! Information. Great work thank you for sharing such useful information’s. keep it up all the best. I can also refer you to one of the best Business Intelligence and Reporting Solutions in Hyderabad