Winning Strategies for ML Competitions from Past Winners

Kunal Jain 21 Oct, 2016 • 4 min read

Introduction

We launched Knocktober last night and we were happy to see the excitement it has created among all the participants. This time we raised the bar for Analytics Vidhya hackathons. I’m sure the faint hearted ones would have panicked after seeing the problem set. And the ones who didn’t budge from their resolve of winning Knocktober will make history on Analytics Vidhya. Since the competition is unique, we thought of providing you the winning strategies from our past competition winners.

Read on, to know the hackathon approach of three top data scientists. They have also shared useful tips & tricks, that will definitely help you to improve your leaderboard position.

 

SRK

1.Sudalai Rajkumar (SRK) , Senior Data Scientist, AV Rank 1

 

 

 

His approach in past competitions:

  1. Understanding the problem and dataset
  2. Pre-processing the data: Data cleansing, Outlier removal, Normalization / Standardization, Dummy variable creation
  3. Feature engineering : Feature selection, Feature transformation, Variable interaction and Feature creation
  4. Selecting the modeling algorithm
  5. Parameter tuning through cross validation
  6. Building the model
  7. Checking the results by making a submission

Once you’ve executed these 7 steps, a basic framework will be ready to do more experimentation. Further, you can concentrate more on:

  1. Feature engineering – This is where bigger improvements come from most of the times
  2. Building varied kind of models and ensembling them – This will help go that extra mile towards the end

Last but not the least,  we must perform a solid local validation. Else, we might end up over fitting on the public leader board.

Tips from SRK:

1. Understanding the problem – It is really important to have a thorough understanding of the problem that we are trying to solve. Only after we’ve understood the problem clearly, we can derive suitable insights from data to tackle the problem and obtain good results.

2. Structured Thinking – It’s a unique way of thinking through the problems. Being a data scientist, one needs to be more structured in his/her thinking in order to obtain good results.

3. Effective communication of results – Effective communication of derived results is as important as performing the data analysis.

Read detailed article here

 

Rohan Rao

2. Rohan Rao, Lead Data Scientist, AV Rank 4

 

 

 

 

His approach in past competitions:

  1. Understand the problem / objective you are trying to solve.
  2. Understand and summarize what data you have / need.
  3. Carefully read about the evaluation metric.
  4. Explore and visualize the data, build simple, base models for benchmark.
  5. Setup a robust / thorough validation framework consistent with the evaluation conditions.
  6. Work on feature engineering and optimizing algorithms.
  7. Try out as many different models / ideas as you can.
  8. Ensemble / Blend / Stack multiple models.
  9. Never hesitate in asking questions, taking help or even teaming up with others.

Tips from Rohan:

  1. Gauge the complexity of the problem: Explore the data as much as possible. Plot features, summarize columns, build benchmark models, and during the process, get a sense of the problem, data, time, complexity, etc. And then slowly build a good solid concrete solution by working on one idea after another.
  2. Algorithm: I use XGBoost and feature engineering for building ML solutions and it’s been a part of my winning solution for most of the contests I’ve done well in, so a big thanks to the community who are actively developing and improving it each day. I also like Collaborative Filtering techniques, which I’ve implemented very often in my work.
  3. Feature Selection Ways: My thumb rule of feature selection is based on CV or Val scores. If selecting a feature improves CV score, I use it, else discard. For large number of features, I usually build small quick models and check variable importance or information gain, and select the top-x from them.

Read detailed article here

 

Steve Donoho

3. Steve Donoho, Top Data Scientist

 

 

 

 

His approach in past competitions:

  • Well, I start by simply familiarizing myself with the data. I plot histograms and scatter plots of the various variables and see how they are correlated with the dependent variable. I sometimes run an algorithm like GBM or Random Forest on all the variables simply to get a ranking of variable importance.
  • I usually start very simple and work my way toward more complex if necessary. My first few submissions are usually just “baseline” submissions of extremely simple models – like “guess the average” or “guess the average segmented by variable X.” These are simply to establish what is possible with very simple models. You’d be surprised that you can sometimes come very close to the score of someone doing something very complex by just using a simple model.

Tips from Steve:

  1. Making Predictions: This is an important step that is often missed by many – they just throw the raw dependent variable into their favorite algorithm and hope for the best. But sometimes you want to create a derived dependent variable.
  2. I probably spend 50% of my time on data exploration and cleansing depending on the problem.

Read detailed article here

Go on & use these tips from the winners and grab your first win in Knocktober.

 

EndNotes

I am sure these approaches and tips will provide you an upper hand in the competition. Learn and improve from these tips. If you want to register for the ongoing competition, click here. I recommend that you atleast register for Knocktober and explore the problem statement. I assure you it will be a great learning experience for you.

Did you find this article? Do you have any questions? Post them in the comments below and let me know how I can help you further. Also, I would like to hear, if you have any feedback for us.

You can test your skills and knowledge. Check out Live Competitions and compete with best Data Scientists from all over the world.

Kunal Jain 21 Oct 2016

Kunal is a post graduate from IIT Bombay in Aerospace Engineering. He has spent more than 10 years in field of Data Science. His work experience ranges from mature markets like UK to a developing market like India. During this period he has lead teams of various sizes and has worked on various tools like SAS, SPSS, Qlikview, R, Python and Matlab.

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear

Examhelpline
Examhelpline 21 Oct, 2016

Thank You for such amazing information. This is one of the best sites which provide all the details of related topics. Examhelpline.in

geometry dash
geometry dash 07 Aug, 2017

Really a great addition. I have read this marvelous post. Thanks for sharing information about it. I really like that. Thanks so lot for your convene.

  • [tta_listen_btn class="listen"]