Hackathons are super fun! The thrill of finding a solution in a time bound, high pressure, competitive situation is addictive. However, if you are participating in a data science hackathon for the first time, the experience can be a bit intimidating.
Recently, we launched our first student focussed hackathon to be conducted later this month. We are super excited about this new initiative. Every time we met a student, they would say that it is very tough for them to participate and win our regular hackathons. We believe that these hackathons would help students benchmark themselves against their peer group. We will also launch a student leaderboard on Analytics Vidhya shortly.
As part of this initiative, we also want to provide guidance to students as they come out and participate in our upcoming hackathon in large numbers. This is why I sat down to create this guide. If you are a student participating in our Ultimate Student Hunt and have never been part of data science hackathons in past, this guide should be the best resource you could get!
If you are not a student, but have not participated in a hackathon before, this guide should be equally helpful. You might also have some more time at your disposal!
How are Data Science / Machine learning hackathons different?
First things first, let us spend a few minutes understanding how data science / machine learning hackathons are different from other hackathons you might have attended in the past. For people participating in a data science hackathon for the first time, the experience can be a bit overwhelming. Why?
There could be several reasons for the same:
- Data Science hackathons are typically more defined than usual coding / product focussed hackathons. This is because the sponsor / problem creator is looking for a data based solution. This would in turn mean that they would need to provide you the data and the problem in the first place. This could be confusing to people who are used to getting a clean paper asking them to work on an idea. Even if the problem is not specifically defined, the number of use cases would usually be few.
- Data Science hackathons typically have a live leaderboard and are mostly transparent about the judging criteria. Very few hackathons in data science would make their decision only on the way you present your solution / pitch / presentation. While this is good – it also means that their is a constant pressure because you continuously see where other people are.
- There are several levels of discoveries which happen during a hackathon and you have to continuously find the next one. Let us take a typical machine learning problem – your first level of discovery could be a simple benchmark solution. You could see a leaderboard and almost make out the benchmark solution marks. Next, you would look for insights in data and build features accordingly. Even if you are on the top of the leaderboard, you can never be sure when would some one unearth the next insight and top you up.
I hope this should give you a fair idea about what goes on during a data science hackathon.
Which tool should I choose – R / Python / SAS / Spark?
If you are preparing for the upcoming hackathon, you don’t have a lot of time to debate on this and trust me it ends up helping more than restricting you in many ways. If you know any of these tools already, just use that tool / language and run with it. Focus on problem solving rather than learning a new tool.
If you are completely new to data science and don’t know any of the above – just pick Python and it will serve you well. I don’t want to start a language war here – but my reasons for picking Python are that it is easier to learn and comes in handy with a larger ecosystem and production readiness. It is also clearly the most popular language for emerging areas like deep learning.
Where do you start? What is the roadmap?
The Path for Beginners
- If you are a complete beginner, I would strongly recommend to start with our workshop – Experiments with Data. Is starts with basic, assumed no data science knowledge and helps you solve a data problem by end of the workshop.
- If you are someone who prefers an interactive course, check out this course we created with DataCamp. This also assumes little Python knowledge and is based on Python.
- Once you have done one of the above, pick up a second practice problem from our practice problems and apply your learning to solve a fresh problem.
Resources / Path for Intermediate practitioners
I assume you have either gone through the resources mentioned above or have experience of solving a few practice problems in past.
- The data exploration guide – this guide lays out various steps involved in data exploration in a lot of detail. Start including them in your analysis work flows.
- Methods to deal with categorical variables – If you talk to any expert data scientist, he / she would always ask you to focus on feature engineering. This article should give you a kick start for the same.
- Methods to deal with Continuous variables – This article starts where previous one ends and it deals with continuous variables.
- Common machine learning algorithms
- Black Friday Practice Problem
Resources / Path for advanced practitioners
So, you have been doing data science for some time now – you know the work flow well. Have mastered the art of handling different kind of variables and applied it to a few problems already. You would have also participated and ranked in a few hackathons already. Now, is the time to put on your flying boots!
- How can you expect winning a hackathon with out mastering these algorithms – XGBoost (R), XGBoost (Python), Random Forest, Gradient Boosting
- Here are a few tips to improve performance of your machine learning models
- Go through past tips and tricks from all the past hackathons on Analytics Vidhya.
By this time, you have all the technical resources you need to make a killing in a hackathon. But, mastering the art of winning a hackathon actually takes much more than these technical skills. I have included a few (behavioural) tips from my experience. You can also read the tips from some of the past winners here.
A few other tips:
- Get into a routine – A lot of people believe that hackathons are about pushing yourself hard during the hackathon. It might work well in short hackathons which last less than 24 hours. But this backfires, if you are in longer form of hackathons. I can’t tell how many times I have seen teams putting an all nighter on first few nights of a hackathon only to have a completely exhausted brain during the later (and more critical period) of a hackathon. This is almost a recipe for disaster in long form hackathon. The best advice I can give you is to create a routine starting today and then follow it for the next 14 days to come. If you prefer to sleep at night, isolate yourself from distractions during the day and make the most of them. If you are a person who is more active at night – get 6 – 8 hours of sleep in the preparation phase and stick to same routine during the hackathon. Don’t disrupt your routine and sleep patterns if the hackathon runs for a few days.
- Focus on fundamentals and business thinking for building features – Another common myth people have is that they need to try out every possible data science solution to come up with the best solution. That is not always the case. Make sure you understand the problem well and think about the problem and real life scenarios to create features for your problems.
- Learn the importance of building hypothesis – The first thing you should do as soon as you see the problem is to gain functional domain knowledge. Next and probably the most important step in any hackathon is to build a comprehensive list of hypothesis. Please note that I am actually asking you to build a set of hypothesis before actually looking at the distributions in the data. This makes sure you are not biased by what you see in the data. It also gives you time to plan your work flow better. If you are able to think of hundreds of features, you can prioritise which ones would you create first. You can also plan your time appropriately on data exploration and imputation / missing value treatment.
- Teaming up with some one with complementary skills helps tremendously – Try and find a person with complementary skill set in your team. If you have been a coder all your life, go and team up with a person who has been on business side of things. This would help you get a more diverse set of hypothesis and would increase your chances of winning the hackathon. The only exception to this rule can be that both of you should prefer the same tool / language stack.
- Prepare libraries of re-usable codes before hand – If there are steps in your work flow which you need commonly, you should keep the codes for these operations handy. Turning 2 dates in a variety of features – just keep a standard set of code / functions, which can be used in all your hackathons.
So, if I was a student participating in a data science hackathon coming up in 2 weeks, I would make sure I understand the data science workflow well. I would focus on using one tool – likely Python for the ease of it and make sure I focus on fundamentals. Believe me, this should be enough to make a mark! So what are you waiting for – go, make your mark!
If you have any questions about gearing up for the hackathons, please feel free to ask them here. Alternately, if you have a tip which I have missed out highlighting – please add it below in comments.