How I Became a Data Science Competition Master from Scratch

Nikhil Kumar Mishra 19 Oct, 2020 • 7 min read

Overview

  • Winning data science competitions can be a complex process – but you can crack the top 3 if you have a framework to follow
  • Hear from a top data science hackathon expert and how he went from scratch to winning data science competitions

 

Introduction

There is no alternative to learning through experience. Especially in the data science industry!

I recently won the top prize in Zindi’s Zimnat Insurance Recommendation challenge – an achievement that ranks top among my all-time data science competition results.

In pure numbers, this was not my first top finish but just one amongst the 30+ such top-3 finishes I’ve had in my own data science competition journey. During this period of starting from scratch and ranking in the top echelons of machine learning hackathon leaderboards, I have come to realize the importance of learning through experience and cannot stress enough on how important the above quote is.

Winning a data science competition is an obstacle-filled journey. You are competing against the top data science minds from around the globe, you’re working on a data science problem that hasn’t been solved before and you’re doing all of this with a strict deadline to boot!

But I can assure you that cracking the top 3 places in the leaderboard is absolutely achievable – if you know what you’re doing.

This is not intended to be a technical post. It’s about my journey into Data Science Competitions. And of course, how and why you should start right now. As a beginner, you certainly do not want to miss reading this. Technical articles for the more advanced readers will follow soon, so stay tuned!

As I mentioned, there is no better way to learn data science than practicing it. I encourage you to join us this extended weekend for a guided community hackathon where data science competition experts will take you through the entire hackathon process LIVE:

And you can always visit the DataHack platform to practice your data science skills or take part in hackathons!

 

My Data Science Competition Journey – From Scratch to Expert

I was introduced to Data Science by one of our professors at the beginning of the 3rd semester in college. He was utilizing Machine Learning to discover planets similar to earth and the possibility of alien life.

Curiosity followed and led me to dive into Andrew Ng’s famous course on Machine Learning. I was introduced to various applications of machine learning, such as stock market price prediction and self-driving cars to name a few.

Googling further about potential opportunities in this field, I discovered platforms like Kaggle and Analytics Vidhya. It added further fuel to my growing interest in data science. Competing and constantly improving against time and a leaderboard was the next challenge – yes, I’m talking about data science hackathons!

Most beginners I have interacted with feel that you need to know the ins and outs of machine learning first. Only then can you start competing in Data Science.

That’s a big misconception.

“For participating in data science competitions, you only need an urge to constantly learn and improve. Getting a good ranking will follow.”

 

My First Competition – Kaggle’s Microsoft Malware Prediction Challenge

Let me quickly talk about my first serious competition on Kaggle – the Microsoft Malware Prediction competition. This came months after failing in a variety of data science competitions. But the experience gained in all the competitions until this point had helped.

In just 2 weeks and with a few submissions in hand, I jumped to the top 20 of the public leaderboard.

As time progressed, I formed a team with a student from Singapore, a Kaggle master, and two industry leaders hailing from London, New York, and Pune. Working together in different time zones was a challenge in itself but we managed to discuss and implement strategies and models day and night on Slack.

And finally, with me leading the team, we finished 25th on the private leaderboard. This was quite close to our public leaderboard rank of 21. This was a very good finish, considering hardly 10 teams in the top 100 on the public leaderboard were able to retain their position on the private leaderboard.

Fast forward to the current day, I have finished in the top 3 in 30+ data science hackathons on various platforms. This includes the first position in almost every major platform I have participated in (and yes, two first-place finishes in Analytics Vidhya’s JantaHack series).

So that’s a quick brief of my journey of conquering data science competitions from scratch. Next, let’s understand how can you, as a beginner, start participating in data science competitions.

 

How Do I Start Data Science Competitions if I Am A Beginner?

Here’s a piece of advice that I wish someone had given me when I started competing in data science hackathons – enroll in any competition you are comfortable with. The most important thing is that you start.

Analytics Vidhya’s JanataHack is a beginner-friendly series of competitions held every week. In the end, many winners are kind enough to post their solutions too.

Anyone just starting out must make it a point to go through winning solutions to previous data science competitions. When you come across any new idea or concept, Google it and take time to understand it. If you can’t transfer your learning from one competition to the other, you have not utilized your time properly.

Transfer Learning is very important,  from deep learning to your learning.

 

How do I approach Data Science Competitions?

Here, I have penned down a few key pointers you should keep in mind when starting out on a new data science competition.

  1. I usually start with a very simple baseline model. Just have a look at the data, then create a model without any data cleaning or feature engineering
  2. Next, the goal becomes understanding the problem and data to create a good validation set. A good validation set is a must. Only then can you can trust your local results. Otherwise, be prepared for a private leaderboard shakeup
  3.  Feature engineering is a key next step. Good features always differentiate between a winner and a top 100 finish
  4.  As the end of the competition approaches, I usually try to build a variety of models like Gradient Boosting Models, Neural Nets, etc. Then follows the stacking or blending of these results. Ensembling gives you the edge to win a competition. Therefore, it’s a tool you will always want to keep handy
  5. One thing many people don’t talk about is the importance of a codebase. Time is a very crucial factor in any data science competition. You should not waste your time writing the same snippets from scratch again and again in multiple competitions. Instead, focus your valuable time on doing something new and better

 

What are the Benefits of Participating in Data Science Competitions?

That’s a valid question! Data science competitions require a significant amount of your time so are they worth it? Let me share a few benefits from my experience in this section.

 

1. Competing and Learning

You learn a lot during data science competitions, from problem-solving to model building. If you want to learn something new, competitions are the best way to do that. Within a short time frame, you study and experiment a lot, and will find yourself constantly looking for better ways to improve your model.

 

2. Networking

Till date, I have teamed up with more than 25 different people from India, Singapore, USA, England, France, and Africa in different data science competitions. These people have ranged from students to industry leaders.

Honestly, networking is one of the biggest benefits of participating in these hackathons. Getting to know and interact with like-minded people is no doubt a big asset for your future career.

I landed my current job at Analytics Vidhya thanks to networking!

 

3. Profile Building/Resume Building

Imagine a scenario where you are hiring a Data Scientist and have shortlisted two great candidates. Both folks have a similar background in Data Science. The first person has completed a few projects in data science, while the second person has completed similar projects, as well as “Achieved “X” Rank in a data science competition competing against thousands of people”.

So which one would you want to give more preference to? As a hiring manager, most people would prefer the second option.

This is not to undermine the importance of a good project but a good rank in a Data Science competition definitely gives you the edge against your competition. A lot of companies nowadays prefer candidates with a Data Science competitions background. As a Data Science aspirant, it’s time you start too!

 

4. Getting Rewarded and Winning Exciting Prizes

Last but not the least, seasoned Data Science competitors have a lot to win and earn. Just during this lockdown, I won enough money to buy myself a car. Platforms like Kaggle have a lot for you if you believe you have the ability to crack the world’s most interesting data science problems. What are you still waiting for?

 

HackLive – Guided Community Hackathon!

What if there was a live session that could encourage and help beginners participate in data science hackathons and improve their ranking? Wouldn’t that be great?

Since its inception, Analytics Vidhya has been trying to decode the problems that the data science community faces and present a viable solution for the same. And the inability to begin participating in Data Science Hackathons has been a prevalent one. So, as a step to tackle this problem, let me introduce – HackLive 2 – Guided Community Hackathon!

Analytics Vidhya’s Data Scientists will combine all their industry expertise and knowledge to help the community answer 3 questions:

  • Is it even worth it if I have a minimal chance of winning?
  • How do I start?
  • How can I improve my rank in the future?

So what are you waiting for? Go and register on the link below:

 

End Notes

I hope I have given you enough motivation to start your own journey into Data Science competitions. More technical articles on approaching Data Science competitions will follow soon. I’m excited to share them with you! Till then you can get started with some of my hackathon solutions on Github here.

Are you a beginner looking for a place to start your data science journey? Here is a comprehensive course, full of knowledge and data science learning, curated just for you to learn data science:

Have you participated in data science hackathons before? How was your experience? Share your thoughts with us in the comments section below and we’ll pick the best ones!

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear

Naveen
Naveen 30 Sep, 2020

Awesomeee! Post. Thank you for sharing your journey :)

aylin
aylin 30 Sep, 2020

Thank you so much sharing your thoughts clearly ! The post motivated me and also contains useful links :)

Harish
Harish 30 Sep, 2020

Congrats Nikhil for top position in Zimnet. Being an insurance industry expert, I also participated in that competition. Could not get any position as I submitted only one solution due to my time constraints. But learnt new things. You are right, if you want to learn data science, jump directly into such competitions and learn to swim with the best.

Vivek
Vivek 01 Oct, 2020

In section, How do I approach Data Science Competitions? , In point number 4 you state "blending of these results". Could you please explain, what you mean by it?

Jeffrey Dagadu
Jeffrey Dagadu 01 Oct, 2020

Awesome post! Really inspiring. Thanks for sharing

  • [tta_listen_btn class="listen"]