C2 - IT Role to Hands-on Data Science

Introduction

Congratulations on choosing data science as your future career! It’s a great decision.

Data science is a thriving field with a remarkable number of job openings around the globe. The demand is outstripping the supply! That means there are more vacancies than qualified data science professionals.

So this journey you have taken to become a hands-on data science professional? You can already visualize why it’s the path to future success. There are a variety of problems you can solve, a whole host of tools you can master, and a broad range of techniques you can learn and then play around with.

Although being from an IT background, we would highly suggest you take data engineering into consideration for your career transition as it matches almost all your strong points (plus it is a potential future job role!).

The canvas is in front of you – now it’s your turn to pick up the data science brush and start painting your way to a successful data science transition. And the fact that you hold a quant degree – you already have a HUGE advantage in your favor!

What can you expect in a hands-on data science role?

A hands-on data science role is little bit of programming, little bit of statistics, a pinch of business domain knowledge and a whole lot of forming and understanding the problem statement

Data science may be the sexiest job of the 21st century but like all jobs, even this one requires hard work. A day-to-day hands-on role in data science requires working on the same problem for long hours performing continuous in-depth research. This role requires you to be well-versed with probability and statistics, programming, machine learning.

A data science role requires you to be in continuous communication with the stakeholders as well as other teams. On the soft skills side, you’d want to keep up on your communication skills, storytelling skills and structured thinking ability. We’ll talk about these skills in a moment.

A typical data science project lifecycle looks like this:

Converting the business problem into a data problem
Hypothesis generation
Data collection or extraction
Data exploration and validating hypotheses
Data modeling
Model deployment
Presenting your work to the final user/client/stakeholder

Depending on your role, your project, and your organization, you’ll be working on different stages. Some projects require a data scientist to do the end-to-end work. Most projects will expect you to be involved from the start but will leave the data collection and model deployment stages to data engineers. It all comes down to specific use cases.

What can you expect in a data engineering role? Given your background, you have an extra edge to look at the engineering side of data-related roles as well.

A data science role seems to be very attractive but the industry requires data engineers more than they do data scientists.

A data engineer is someone who develops and maintains architectures for smooth data flows within large-scale data processing systems. He deals in raw, unstructured, and dirty data which is often inconsistent and invalidated.

It is the job of a data engineer to create architectures and systems to churn out data that is efficient, reliable, and of high quality. They work in sync with the team of data scientists as well as the stakeholders.

Given your engineering background, you will find it much easier to understand the following concepts –

Software engineering,
Database systems
Cloud technology
Efficient programming skill

This will definitely provide an edge over other candidates!

What are the key skills required to excel in a hands-on data science and data engineering role?

Data science and data engineering are multi-faceted roles. There is no one-size-fits-all approach to learning these subjects. Having said that, there are a few core skills you will need to pick up to make a successful career transition to data science.

One of the biggest characteristics you’re bringing to the table is your quant degree. This is SUPER valuable in the world of data science and it’ll open a lot of doors for you.

Having a quant degree is looked upon favorably by recruiters. Your quant skills will be required when you’re working with machine learning algorithms. You would be able to understand how they work under the hood and that will help you fine-tune your model. Like we said, this is extremely useful in your transition!

Here are the key skills you would need:

Programming knowledge
Software engineering
Database Systems
Big Data
Machine Learning concepts
Model Deployment

Apart from these core skills, there are other skills you should be aware of, such as:

Statistics
Structured Thinking
Dashboarding
Deep Learning concepts

Data engineers need to have a good command of programming languages like Java, Python, Scala, and SQL in order to be successful as a data engineer. An advantage for you is that you must have covered most of these skills in the past as part of your IT experience.

Some of the commonly used tools in big data are Apache Spark, Hadoop, AWS, and on the database side, both SQL and NoSQL databases have equal importance. MySQL, CassandraDB, MongoDB are a few common ones.

How can you excel in each of these required skills?

Ah, the key question! Now that you know what you need to learn, the attention turns to how you can learn those skills. Let’s look at a few options and suggestions on how to pick up and hone the key skills we mentioned above.

Programming Knowledge

Machine Learning has seen a great jump only because of the boost in computing power. Programming provides us a way to communicate with machines. In the case of data science, you must be comfortable with programming but in data engineering, you need to be good at programming concepts.

First of all, choose the programming language of your choice. Python, R, or Julia are to name a few and each has its own set of Pros and Cons. Python is a general-purpose programming language having multiple data science libraries along with rapid prototyping whereas R is a language for statistical analysis and visualization. Julia offers the best of both worlds and is faster. If you are confused about which language to choose, we have compiled a resourceful article for you:

5 Popular Data Science Languages – Which One Should you Choose for your Career?

Python is the market leader right now and continues to be widely used in the industry. It’s a lot easier to perform machine learning tasks using Python, due to the availability of libraries and high support for deep learning. For data engineers, Java is the go-to language and the majority of big data frameworks are written in Java. Another appealing language is Scala!

Database Knowledge

As a hands-on data science professional, you’ll be working a LOT with databases. You will need them to extract your data, extract subsets, and extract samples.

Hence, having hands-on knowledge of databases is essential. The most common database language you should pick up is SQL.

SQL is a must-have skill for every data science professional. You should start from the basics of databases and structured query language (SQL) and learn about everything you would need in any data science profession, including Writing and executing efficient Queries, Joining multiple tables, and appending and manipulating tables.

Whereas, if you are inclined towards data engineering, you will be requiring to go deeper into this field and understanding in-depth NoSQL as well. Knowledge of AWS and other cloud services is also essential.

Big Data

We are generating data at a rate of 2.5 Quintillions per day! Due to the rise of the internet, social media networks, IoT there has been a sudden boom in the rate of data we are generating. This data is high in volume, velocity, and veracity which form the 3V’s of Big Data.

Organizations have been overwhelmed with such a large amount of data and they are trying to tackle this data by rapidly adopting Big Data Technology so that this data can be stored properly and efficiently and used when needed.

Hadoop, Spark, Apache Storm, and Flink, Hive are some of the Frameworks/ Tools you must master.

Statistics

Statistics is the grammar of data science.

When you start learning to write sentences, you must be familiar with grammar to build the right sentences similarly statistics is an essential concept before you can produce high-quality models. Machine Learning starts out as statistics and then advances. Even the concept of linear regression is an age-old statistical analysis concept. 🙂

The knowledge of the concept of descriptive statistics like mean, median, mode, variance, the standard deviation is a must. Then come the various probability distributions, sample and population, CLT, skewness and kurtosis, inferential statistics – hypothesis testing, confidence intervals, and so on.

Statistics is a MUST concept to become a data scientist. You can deep dive into some of these concepts with these clear articles and their examples:

Machine Learning Concepts

For a data scientist, machine learning is the core skill to have. Machine learning is used to build predictive models. For example, you want to predict the number of customers you will have in the next month by looking at the past month’s data, you will need to use machine learning algorithms.

You can start with a simple linear and logistic regression model and then move ahead to advanced ensemble models like Random Forest, XGBoost, CatBoost, and so on. It’s a good thing to know the code for these algorithms (which just takes 2-3 lines) but what’s most important is to know how they work. This will help you in hyperparameter tuning and ultimately a model that gives a low error rate.

If you are looking for specialization, Natural Language Processing (NLP) and Computer Vision are two fields that are absolutely thriving right now. Each requires you to dive deep into those specific fields so make sure you’re aware of what you’re getting into.

Structured Thinking

Structured thinking is a process of putting a framework to an unstructured problem. Having a structure not only helps an analyst understand the problem at a macro level, but it also helps by identifying areas that require deeper understanding.

Without structure, an analyst is like a tourist without a map. He might understand where he wants to go (or what he wants to solve), but he doesn’t know how to get there. He would not be able to judge which tools and vehicles he would need to reach the desired place.

How many times have you come across a situation when the entire work had to be re-done because a particular segment was not excluded from data? Or a segment was not included? Or just when you were about to finish the analysis, you come across a factor you did not think of before? All these are results of poorly structured thinking.

Dashboarding

Data Science projects are more of a treasure hunting job, the treasure being the insights you fetch from the data. The question is what is the price of the treasure? Well, that is decided by your stakeholders. The only way to get a good price is to be able to communicate how insightful the results and how can this treasure help them in improving the profits and organization.

This is where dashboarding comes in. A lot of data science transitioners ignore the dashboarding aspect because they focus on model building. But being able to communicate your thoughts and your key results to the stakeholder – that’s what separates a good data scientist from an amateur one.

Spending time on understanding what dashboarding is and how it works will give you a huge advantage.

Focus on Gaining Hands-On and Practical Experience in Data Science

kaggle grandmaster series inspiration

Whatever we have covered so far has a lot to do with understanding different data science concepts. We’ve covered both the technical side (programming, machine learning, statistics, etc.) and the soft skills aspect (structured thinking).

So, what’s the next step for you in your transition journey?

It’s time to apply your knowledge in a practical scenario! Yes, you need to marry your theoretical knowledge with hands-on practical experience to truly stand out as a data science transitioner. Given your background, there are broadly three ways you can do this:

Participate in hackathons: This is perhaps the most popular option to gain practical knowledge. Data science competitions and hackathons are awesome! You’ll love the variety of business problems we get to solve and when we add in the pressure of finding a solution under a tight deadline – it’s a great learning experience. Data Science hackathons area great way to:
- Test your data science knowledge
- Compete against top data science experts from around the world and gauge where you stand
- Get hands-on practice of a data science problem working in a deadline environment
- Improve your existing data science skillset
- Enhance your existing data science resume
Pick up open source data science projects: One key thing that has helped transitioners immensely is picking an open-source data science project and running with it. This not only helps you understand the key areas you need to improve on but also shows you the way forward. And these projects aren’t your run-of-the-mill data science projects. These are specific projects that tackle a certain data science sub-field, such as computer vision, web analytics, and so on. The project could be a dataset, a state-of-the-art library that has brought the data science field forward, or even an open-source analytics tool. So, pick a project that intrigues you and start working on it today!
Apply for data science internships: This is the most popular path to breaking into the data science industry. Even for experienced people – internships are a very effective way to break into data science. We have now seen so many successful transitions enabled by internships. Not only do you gain hands-on experience in data science, but you also get to learn how the industry works and how a typical data science project functions. It’s an invaluable experience!
Deploy a machine learning model: Once you have made the complete data science project, it is time for the intended user/ stakeholder to reap the benefits of the predictive power of your machine learning model. In simple words, this is model deployment. This is one of the most important steps from a business point of view but also the least taught one. Remember that the end-users are the stakeholders and the model may need to be used by multiple people at the same time who are NOT data scientists. Therefore they’ll not be running a Jupyter or Colab notebook on GPUs. This is where you need a complete process of model deployment.

Stay up to date with current developments in the domain

This is another essential aspect of working in data science. We’ve seen the majority of transitioners skip this step and focus exclusively on picking up machine learning concepts – don’t do that!

Data science is still a very nascent field. We see major breakthroughs happening on a regular basis (sometimes a weekly basis!) and it can become difficult to keep up with all that’s happening. But if you can find time to catch up on the latest developments, you’ll already have an edge on your competition.

Let us give you an example. The Natural Language Processing (NLP) field has come a long way in the last 3 years (since 2017). We see a new language model seemingly every week that builds on the last major breakthrough. If you can keep up with this pace, if you can spend a bit of time understanding what’s going on, you’ll gain invaluable knowledge that your peers won’t have.

So what are the different ways in which you can stay up to date in the vast space of data science? Here are three suggestions based on our experience:

Follow Newsletters and blogs: This is the easiest way to stay abreast of developments. There are plenty of good newsletters out there (just do a quick Google search) that will send you weekly updates. You can also subscribe to blogs like Analytics Vidhya to check out the latest tools and techniques in data science.
Follow People: Another no-brainer! The data science community is a great place to connect with fellow transitioners, experts, and industry veterans. You’ll be surprised how approachable these experts are and they’re always willing to share their knowledge and advice. Find these people on platforms like LinkedIn and keep following them regularly.
Attend MeetUps: This one requires a bit of effort but the eventual payout can be HUGE. Meetups offer you an unparalleled opportunity to meet your fellow transitioners and connect with them, learn from them, and build a rapport that might benefit both parties. Over time, once you are comfortable with core machine learning concepts, you can even try and speak at these meetups to build your profile

The big salary question – what can you expect from this transition?

Making a career switch to data science for getting a salary bump is entirely justified. However, it isn’t as straightforward as you might think. There are certain things, such as work experience and your current domain, that will play a MASSIVE role in deciding your salary post-transition.

Taking figures from the popular and relatively accurate website called Glassdoor, this is what the salary situation looks like for a data scientist:

As you can see, the average salary is approximately INR 10,00,000 per year in India whereas the figure is approximately $113,000 in America.

Similarly, the average salary for data engineers in 2020 is approximately INR 7,00,000 per year in India wheras America has around $103,000.

If you bring a bit more experience to the table and you have relevant domain experience, you might look at a more senior role (though this is a bit rare if you have no prior data science experience):

Similarly, the average salary of a senior data engineer is around the lines of INR 18,00,000-19,00,000 in India and in the USA, you can expect this figure to be around $136,000.

As we said, it comes down to how relevant your previous experience is. More often than not, as a person transitioning from an IT role to hands-on data science or data engineering role, you’ll be looking at the first graph.

What are the challenges to get the “Sexiest Job of the 21st Century”?

kaggle grandmaster series future plan

There has never been a better time to become a data scientist. Data Science is a booming industry but it also comes with its own set of challenges. Keeping in mind that you come from an IT background, it should help you overcome the majority of challenges, however, we’ll list a few that need your special attention. If you have reached till here, we know you can work out through obstacles. Let’s take them up one by one –

Finding the right data engineering resources – Data science is relatively a new and emerging field but data engineering is still in its nascent stage and organizations are in constant effort in formalizing this job role. Therefore you might find it a little hard to find the right resources to study data engineering. However, you can get past this by studying data engineering from analyticsvidhya.com/blog
Mastering Statistics – Statistics forms the core of data science algorithms. You’ll need some effort to keep down your keyboard and learn using the old pen and paper method and then move on to embrace your coding skills into statistics. Statistics will help you step ahead in your data science and data engineering career so make sure to work on it in your initial days.
Absence of practical knowledge – No matter at what stage of the data science career cycle you are, the key thing is to have experience with real-life projects. Gone are the days, where the definition and code of a simple random forest algorithm would have landed you the job. You must be clear with the ins-and-outs of the subject. You can work through the challenge by focusing on the points mentioned above.
Focussing on the tool rather than the concept – A tool is merely a means to getting your data science task done in an efficient manner, it is in no way an indicator of a strong grasp of data science tasks. A great example is SAS, it’s a paid data science tool that was used majorly in the analytics industry but after the arrival of open-source tools like Python and R, it saw a decline. Therefore, it’s imperative that you focus on the concept rather than the tool.
Structured Thinking – Ah, the most crucial skill yet the most overlooked one. Structured Thinking as discussed above is the art of breaking down the large unstructured problems into smaller and manageable problems. A data science project is valid as long as the problem statement is correct, otherwise, the whole project goes down the drain. Being a data science professional, you must ensure that you are working on the right problem statement.

Final Thoughts

Now that you are aware of the various components you’ll need to put together to make this career transition, are you prepared to buckle up and take this thrilling journey? The payoff is immense but as you might have gathered, you’ll face plenty of obstacles along the way. Your eventual success will come down to how well you can get past these hurdles.

Introduction

What can you expect in a hands-on data science role?

What can you expect in a data engineering role? Given your background, you have an extra edge to look at the engineering side of data-related roles as well.

What are the key skills required to excel in a hands-on data science and data engineering role?