I came across all kinds of advice when I was looking for a data science internship. There’s no dearth of people espousing the value of internships in data science. But surprisingly, not many people talk about how to land that internship.
My learning journey during my internship with Analytics Vidhya was equal parts challenging and fulfilling. I realized how vast and complex data science is and how unprepared I was for a full-time role. My path to become a data scientist would have been far more arduous and difficult one if I hadn’t first interned.
Even for experience people – internships are a very effective way to break into data science. We have now seen so many successful transitions enabled by internships.
If you are looking for tips to prepare yourself for a data science internship, then you’ve come to the right place!
In this article, I’ve drawn on my experience on the key aspects you need to know to land your first internship in data science. Each section is filled with plenty of tips, tricks, and resources. It won’t be easy – but you would know what needs to be done.
If you are looking for a guided journey with mentorship – check out our Certified Program: Data Science for Beginners (with Interviews) . This program will complement your foray into data science and give you a huge advantage in your internship search.
Note: The focus of this article will be only on the technical skills that one needs for a data science internship. If you have any suggestions that can help the community, do share your thoughts in the comment section below.
Table of Contents
- Getting Familiar with the Basic Data Science Terminology
- What is Data Science?
- Data Scientist vs Statistician
- Common Terminologies in Data Science
- Start your Data Science Journey
- Understanding of Statistics and Probability
- Good Coding Skills (Any programming language)
- Basic Machine Learning Algorithms
- Build your Digital Presence (Online Data Science Portfolio)
- Work on Projects
- Create a GitHub Profile
- Write Blogs
- Create and Optimize your LinkedIn Profile
- Do’s and Don’ts for Crafting your Data Science Resume
- Prepare for your data science internship interview
- Structured thinking
- Knowledge about the company and role
- Boost your chances of Selection
- Advanced Machine Learning Techniques
- Participate in Data Science Competition
- What will you Learn During your Data Science Internship?
- How to solve real life projects and it is not specific to Algorithms
- Ways to tell Data Stories
- Team Work
- Gain practical experience in the field
1. Getting Familiar with the Basic Data Science Terminology
What’s the first step, the absolute ground zero, before you start applying to internships? Understanding what data science is!
Let’s take a moment to answer this question, look at the different roles in data science, and familiarize ourselves with common terminologies in this field.
It’s important to know what you’ll be working on in the first place. Answer this question before anything else – why do you want to work in data science? Is it because you love programming, math, statistics and the opportunities they offer? Or are you going with the flow since ‘Data Science’ and ‘Machine Learning’ are currently trending?
1.1 What is Data Science?
The amount of data being generated every day is increasing exponentially! The sources of data and the ability to collect and store it has come a long way in the last decade. Companies are using a variety of tools and techniques to mine patterns in the data and gather useful insights. That, in a nutshell, is what data science is all about.
“Data really powers everything that we do.” – Jeff Weiner, LinkedIn CEO
Simply put, data science involves the use of various techniques to understand data and build predictive models to make business decisions. A few popular applications of data science include fraud detection, sports analytics, airline route planning, etc. Here is an article that lists down 13 mind-blowing applications of data science:
So if data science is all about deriving insights and finding patterns from the data, then what is the difference between a data scientist and a statistician? Excellent question! Let’s find out.
1.2 Data Scientist vs Statistician
Both data scientists and statisticians work with the data to derive useful insights from it. A statistician is focused on identifying the relationship in the data while a data scientist works towards using the relationships and building models to predict future outcomes. The aim of a data scientist is to build a generalized model with high accuracy.
Statisticians often use tools like R, Excel, or MATLAB, since these have a number of libraries for data analysis. Data scientists, on the other hand, mostly work with Python, Apache Spark, etc. for exploring the data and building models. Below is a cool infographic that summarizes the differences between these two roles:
Here’s another well illustrated article showcasing the variety of roles available in data science. Please understand that data scientist isn’t the only job in this field!
1.3 Common Terminologies in Data Science
Data science is a complex and vast field. Let’s understand it’s different components so you can narrow down your area of focus in the long term.
Machine Learning: Machine learning is the use of algorithms (such as linear regression, logistic regression, decision trees, etc.) to learn from the data and make informed decisions. For example, using past data of people who have taken loans and trying to predict if they’ll come back for another loan.
Deep Learning: Deep Learning is a subset of machine learning, designed to mimic the decision making capabilities of humans. For instance, identifying the objects in a given image, or classifying images as a cat or dog. If you are still skeptical, about the difference between machine learning and deep learning, you can check our this link: Deep Learning vs. Machine Learning – the essential differences you need to know!
Natural Language Processing (NLP): NLP is a branch of data science that deals with analyzing, understanding, and deriving information from text data. All those reviews you see on Amazon? Or all the Tweets you browse through daily? NLP techniques are used to parse through them and understand the sentiment of users. NLP is one of the hottest fields in data science right now.
Computer Vision: As the name suggests, computer vision gives machines the ability to see and understand their surroundings. Ever notice how Facebook automatically suggests tags in a picture? Or how self-driving cars detect objects on the road? These are prime examples of computer vision. This is another field that’ll see a lot of jobs come up in the next few years.
Recommendation Engines: Anyone who has ever used Flipkart or Amazon has been part of a recommendation engine. This consists of analyzing the past user behavior to offer relevant recommendations or suggestions. ‘Customers who bought this also bought’ or ‘Recommended for you based on your past purchase’ are examples of recommendation engines at work.
2. Start your Data Science Journey
So you’ve decided to take the plunge. You want to become a data scientist and nothing can stop you. First, congratulations on picking the hottest field in the industry!
If you’re a fresher with no industry experience, internships are the best way to land a role in data science. They offer you a chance to get industry experience while working with experienced veterans. There is so much to learn in those few months that will shape your professional career.
In the next few sections, we shall look at the essential skills required to land your first data science internship.
Note: As mentioned previously, we will focus on the technical aspect of your portfolio rather than the soft skills (such as good attitude, confidence, etc.) required to clear a typical data science internship interview.
2.1 Understanding Statistics and Probability
Statistics and probability are the fundamental core skills required for data science. Without a solid understanding of these two, you won’t make much headway in this field (or the interview process!). From analyzing the data and making valuable inferences to understanding how the model works, the basic concepts of stats and probability are integrated in the data science ecosystem.
There are a number of statistical techniques and probability distributions we can leverage to understand the structure of the given data. Here are a few important topics that you will be using while working on a data science problem:
- Descriptive Statistics
- Mean, Median, Mode
- Variance and Standard Deviation
- Bernoulli Trials & Probability Mass Function
- Central Limit Theorem
- Normal Distribution
- Inferential Statistics
- Confidence Interval
- Hypothesis Testing
You can expect a bunch of questions in your interview from these two fields (statistics and probability). Below is a list of useful resources to help you get started (or revise certain concepts):
- A blog on inferential stats: Comprehensive & Practical Inferential Statistics Guide for data science
- Detailed guide for hypothesis testing: Your Guide to Master Hypothesis Testing in Statistics
- Quiz on the statistics used in data science: 41 questions on Statistics for data scientists & analysts
2.2 Good Coding Skills (Pick a Programming Language)
Yes, you need to know programming to become a data scientist. There’s no getting away from it. AutoML (automated machine learning) is gradually being accepted in the industry but right now, there’s no alternative to cold hard coding skills.
The two most popular programming tools these days for data science are Python and R. You must be familiar with at least one of the two. These are both open source programming languages, with a massive active community that’s growing by the day.
R is mainly used for exploratory work and is preferred for statistical analysis tasks. It has a comparatively bigger library base for statistical packages. On the other hand, Python is preferred for machine learning and deep learning tasks. It has numerous machine learning and deep learning libraries and packages.
Here are a few articles to help you get started with Python and R:
- Step by Step Guide to Learn Data Science on R
- Baby Steps in Python – Libraries and Data Structures
- Free online course on Getting Started with Python for Data Science
Python is definitely more in-demand in the industry these days. It’s an easy choice if you’re inclined towards learning advanced machine learning topics and of course, deep learning. The flexibility Python provides is unparalleled in these tasks. R is a wonderfully adept tool for doing exploratory analysis, including producing some really insightful and aesthetically pleasing plots.
2.3 Basic Machine Learning Algorithms
If you have covered the basics of statistics and probability, and have worked on your coding skills, the next step would be to learn about the basics of machine learning. Make yourself familiar with common machine learning algorithms, like linear regression, logistic regression, decision tree, random forest, naive bayes, k-nearest neighbour, and support vector machines.
Try to focus on one algorithm at a time and understand the intuition behind each technique. Having a theoretical knowledge of the algorithms and how they work is as important as being able to implement the algorithm. If you know how the algorithm works, it will be easier for you to understand the various parameters of the algorithm, tuning those parameters and also for deciding which algorithm to use with which type of data.
You can refer to this article to learn the above mentioned algorithms in detail:
3. Build your Digital Portfolio (Online Profile)
You’ve worked so hard to learn all these new concepts. You should complement all that effort with learning how to showcase your skills.
Statistics, programming and machine learning alone will likely not land you that internship. You need to build your digital presence. Showcase your immense potential and demonstrate the skills you have acquired during your data science journey. Let the world know!
In this section, we’ll look at the different ways you can leverage to build your profile.
3.1 Work on Projects
I believe the best way to learn anything is by putting your knowledge into practice. Nothing says “I know this technique” like showcasing it in a project. Building an end-to-end project gives you an idea about the different possibilities and challenges a data scientist potentially faces in a day-to-day role.
You can look for open source projects relevant to your field of interest. Trust me, there’s no dearth of data on the internet. I’m a huge fan of fiction so I love using NLP to analyze the work of my favorite authors. This shows your passion for data science and gives you an edge in the eyes of your future employer.
Here are a few practice problems to get some valuable hands-on experience:
- Machine Learning
- Natural Language Processing
- Recommendation Engine
- Computer Vision
Remember these are just to get you started. You can browse through our entire project list here – Practice Problems on the DataHack Platform. If you need ideas for projects related to machine learning, have a look at this Discussion thread. Get your hands dirty!
3.2 Create a GitHub Profile
You should also start building your GitHub profile at this stage. This is essentially your data science resume which anyone in the world can access.
Most data science recruiters and interviewers look at the candidate’s GitHub profile to evaluate his/her potential. While you work on your projects, you can simultaneously list down the problem statement and code on your GitHub. I’ve put together a small checklist you can follow next time you’re adding your code to GitHub:
- Add the problem statement
- Make a clear readme file
- Write clean codes
- Add comments in the code
- Add as many personal/course projects as possible
- Contribute to open source projects, if you’re at that level
3.3 Write Blogs
I’ll tell you a big secret that propelled my data science career – writing articles. I have made it a habit to take notes ehenever I’m learning a new concept. It’s easy converting that into an article later. This helps me understand the technique in a much clearer and lucid manner.
You should also do the same! Our community is happy to share their thoughts and feedback with you. When you put your articles out in public, people often share their views – such as “adding a visualization of actual vs predicted could be helpful”, which can help you improve.
Quora can be considered as an alternate option to writing blogs (which is where I first started writing). Breaking down a complex topic into easy-to-understand words helped you grasp the topic and fine tunes your structured thinking skills.
To start with, you can write about some basic topics, like Data Exploration using the matplotlib library, your approach and solution for a practice problem, a summary or notes of a MOOC you completed, etc.
3.4 Create and Optimize your LinkedIn Profile
LinkedIn is the world’s biggest professional network. You should be on it even if you’re a fresher or still finishing up graduate school.
Recruiters often use LinkedIn to either verify your profile or reach out to you in case of an opportunity. You can consider it as your second resume, or the digital version of your paper resume. If you apply for an internship and your profile isn’t updated (or doesn’t exist), you might miss out.
Optimize your LinkedIn profile according to the internship you’re applying to. Update your past experience (if any), education level, projects and interest. If you haven’t already created a profile, do it now. You should also start building your network by connecting with people in data science.
There are PLENTY of them including tons of influencers who regularly post useful developments. You should consider this step utterly mandatory.
4. Do’s and Don’ts for Crafting your Data Science Resume
Your resume is essentially your professional career’s highlight reel. It’s the first thing the recruiter/hiring manager looks at so crafting the perfect resume is absolutely critical in your quest to get an internship.
Even if you possess every skill listed in an internship’s requirements section, there’s a good chance you might not get the interview call if your resume is not up to the mark.
You must, absolutely must, spend a good amount of time on creating and perfecting your resume.
So what are some key things to keep in mind while doing this?
Make sure your resume is up to date and does not have any spelling mistakes. Check it twice, perhaps even thrice. Make your colleague or friend review it from a recruiter’s perspective.
Always keep this in mind when you’re creating or updating your resume:
Write what you know, and know what you write.
Remember that project you did in first year of college, some 2-3 years ago, the details of which you can’t recall? Either study about it, or don’t add it on your resume. Having 10 projects that you cannot talk about is a red flag for the recruiter! The same goes for all the technical skills you pen down.
Here is an excellent article on Tips to Prepare an Outstanding CV for Data Science Roles.
Make sure you take out some time and watch the below video by Kristen Kehrer. She has a wealth of experience in this field and has gone through hundreds of resumes in her career. In this video, she talks about the important points one needs to remember while building a data science resume and provides tips and tricks for cracking the interview process.
You can also check out her course – Up Level your Data Science Resume, to get a deeper insight into how data science resumes should be designed.
5. Prepare for your Data Science Internship Interview
The biggest challenge in getting a data science internship is undoubtedly the interview process. Given that you don’t have previous work experience in this field, what aspects of your resume will the recruiter look at? What skills should you demonstrate in your resume and in the actual interview?
Big questions! Knowing how to navigate these tricky waters could make or break your chances of getting the internship.
You will, of course, mention the projects you have worked on (or are currently in progress). But apart from that, there are certain topics which the interviewer will be keen to test you on, irrespective of the background you come from. This section looks at the key things you need to focus on and prepare for the interview.
5.1 Structured Thinking
The ability to structure your thoughts is an invaluable skill in the complex world of data science. The interviewer will judge you on your ability to break down a problem statement into smaller steps. How you do that is where the goldmine lies.
For any given problem statement, it is necessary to identify what the end goal is. The next step is to understand the data you’re given and pen down a process required to get to that end goal. And all of this happens in a finite time frame (the interviewer does not have all day!). Are you seeing why it’s so important to have a structured thinking mindset?
To check your structured thinking skills, you would be given a question like – How many mails are being sent at the moment? That’s what I was asked during my interview. Or how many red colored cars are on the road in Bangalore? How many cigarettes are sold in a day in India?
For example, if I want to understand why charge offs have increased suddenly in credit card portfolio over the last month, I would lay it down in a structure similar to this:
These are questions with no precise solution. So how do you go about solving them? The first thing to understand is that the interviewer does not expect an exact numerical answer. Instead, they are trying to understand how you look at the problem and your approach to getting the final answer. It’s a good idea to ask for a pen and paper (or a whiteboard) so you can demonstrate your thinking step-by-step.
I highly recommend going through this article – The Art of Structured Thinking and Analyzing. The piece explores everything you need to know about structured thinking and showcases a few examples to help you enhance your structured thinking skills.
5.2 Knowledge about the Company you’re Applying to
You might feel that this point isn’t relevant to the discussion. This isn’t something that needs to be mentioned since everyone goes through the job description before applying. It’s a fair point.
But just browsing through the JD isn’t good enough.
We regularly hear from recruiters how prospective candidates walk in without having read about the role they are interviewing for. I have personally seen people take up the internship and leave within a couple of weeks because they did not like their role.
You must know what the company does and the organization’s vision before you decide to apply for the job. There are no two ways about it.
I’d suggest researching about the company to understand what they work on. How do you see yourself fitting in? Can you directly see an impact you could make with your skillset? You must also go through the job description thoroughly and ask questions during the interview to understand your fit with the company. This will save your time and the company’s as well.
I encourage you to go through this guide that lists down the important topics to cover while preparing for a data science interview:
6. Boost your Chances of Selection
The pointers we’ve seen so far can safely be shelved under the ‘MUST-HAVE’ category. You simply can’t make do without ensuring you check each one of them. But you can further enhance your existing skillset to stand out from the competition. And who doesn’t want to do that?!
In this section, I have drawn on my own internship experience to give you a few additional tips and tricks to boost your chances of getting selected.
6.1 Advanced Machine Learning
Nothing will impress the interviewer more than watching you confidently answer advanced machine learning questions. Most folks they’ll interview will be able to solve basic questions. Holding advanced ML knowledge will most definitely give you the edge.
Make sure you have covered the basic machine learning topics first that we discussed earlier (stats, probability, regression, tree algorithms, etc.). You can then safely jump into the advanced ML algorithms, recommendation systems, time series forecasting algorithms, etc.
At this stage in your career, i’s not necessary to know ALL the algorithms in detail. I’m sure you’ll 3-4 techniques that you find really helpful so learn them well – and rattle them off in your interview. You should have a fair understanding of the algorithm(s) and the math behind it. You can choose a particular field based on your interest and explore the various techniques in that domain.
To give an example, if you are interested in time series, you start exploring the different forecasting techniques, the concept of stationarity or even pick a project on time series and work on it. Or if NLP is the field that interests you, you can work on understanding how features are extracted from text based data, what algorithms can be used on textual data and so on.
Here are some important links to get you started:
6.2 Participate in Data Science Competitions
This adds a MASSIVE boost to your resume and increases your chances of getting the internship. Having worked on, or completed a project, is proof that your knowledge is not restricted to just books. You have clearly made an attempt to translate your theoretical learning to a real-world dataset – a sure-shot sign that your curiosity, passion and will to learn is quite high.
To start with, I encourage you to participate in data science competitions. Start with the hackathons listed on AV’s DataHack platform or on Kaggle. These platforms provide problem statements which mimic real-world scenarios, thus giving you an invaluable exposure to what life in the industry will feel like.
You also get to compete with (and learn from) top data scientists from around the world. This acts as a good barometer for your own progress. Keep practicing and you’ll be surprised how quickly you rise in the leaderboard rankings. Practice is KING in data science.
7. What will you Learn during the Internship?
What can an internship give you that textbooks, MOOCs and videos can’t?
The single most valuable commodity the hiring manager will value when pouring over your profile. I realized how useful this is during my own internship with Analytics Vidhya.
There is so much you can learn from your internship if you go in with an open mind and a willingness to learn every single day. That’s exactly how you succeed in data science!
In this section, I have penned down the major takeaways I experienced during my data science internship.
7.1 How to solve real-world projects
You would be working on a real-life project during your internship. This is invaluable experience. Once you’re on board, you might well find yourself entrenched in the end-to-end data science lifecycle, including defining the problem statement and building models.
If you previously participated in data science competitions, you will have an idea about the different challenges data scientists come across. But here’s the caveat.
The problem statements and the datasets provided in these competitions are very different from real-world scenarios. The datasets are messy and unstructured in the industry. There’s a ton of data cleaning work required before any model can be built.
In fact, don’t be surprised if 70-80% of your tasks involve data cleaning.
You will learn how to structure a problem statement, understand the domain and the data required to solve the problem, and then figure out sources to extract that data. The next step is to get knee deep into research. Find out the approaches other data scientists have taken to solve similar problems.
This will give you a fair idea about what should work well and what is not worth investing time on. While experiments are encouraged in data science, there’s a limit to how much creative freedom you’ll get from your manager. Filter out the aspects you know won’t work beforehand.
7.2 Ways to tell Data Stories (Exploratory Data Analysis)
People often spend more time working on building a model than understanding the data. I myself used to do that for a long time. It was during my internship, while I was working with on a project when I realized how wrong my approach was. No
I cannot stress enough how important it is to really understand the data you have. There are so many levels and hidden aspects in a dataset which we often overlook in our haste to build models. This is something you will learn during your internship (but should be prepared for beforehand).
Spend as much time exploring the data as you can! Plot graphs, find patterns, and just dive into it like it’s the best work in the world (because it is!). Try to understand the distribution, look for factors that affect your target variable and make inferences. Build a hypothesis, visualize data, find insights, and most importantly, discuss your findings with your teammates.
7.3 Team Work
One of the perks of a data science internship is working with incredibly smart and supportive people. Data science projects require collaboration and coordination among colleagues as you work towards the end goal. I consider myself lucky to be a part of such a great team.
The best part about working in a team is that you always have someone to discuss your thoughts (and clarify your doubts). For instance, during my internship at Analytics Vidhya, we took part as a team in a huge hackathon. The dataset had multiple files, so we divided the task and each of us worked on understanding a particular file and shareed our knowledge with the rest of the team.
It was an amazing experience.
I learned different approaches to tackle a problem and techniques to improve/optimize my code during those discussions. Working as a team would not only help you build your soft skills, but also hone your technical skills. A win-win combination!
7.4 Gain practical experience in the field
When you start your data science job search, you will most likely find that most companies ask for some experience in the domain. You should find out the kind of problems your company is working on, and think of ways in which you can contribute. Discuss your ideas with the people who are working on the project(s).
You should also try to understand the roles of other people in the company. You can talk and discuss with people in different teams. For instance, talk to the marketing team and understand if you can perhaps think of a data drive solution to their problems. Make the most of the opportunity you have got. Be curious, ask relevant questions and learn from your team.
I had a ball writing this article. There was so much I learned during my internship and this made me relive those moments. When I look back at that time, there was so much I didn’t know. I hope this article will help you overcome the obstacles I initially faced.
If you have any questions or want to share your experience with me and the community, please do so in the comments section below. You can also check out the below resources to accelerate your chances of getting a data science internship: