Winning Strategies for ML Competitions from Past Winners

Kunal Jain Last Updated : 21 Oct, 2016

4 min read

Introduction

We launched Knocktober last night and we were happy to see the excitement it has created among all the participants. This time we raised the bar for Analytics Vidhya hackathons. I’m sure the faint hearted ones would have panicked after seeing the problem set. And the ones who didn’t budge from their resolve of winning Knocktober will make history on Analytics Vidhya. Since the competition is unique, we thought of providing you the winning strategies from our past competition winners.

Read on, to know the hackathon approach of three top data scientists. They have also shared useful tips & tricks, that will definitely help you to improve your leaderboard position.

SRK

1.Sudalai Rajkumar (SRK) , Senior Data Scientist, AV Rank 1

His approach in past competitions:

Understanding the problem and dataset
Pre-processing the data: Data cleansing, Outlier removal, Normalization / Standardization, Dummy variable creation
Feature engineering : Feature selection, Feature transformation, Variable interaction and Feature creation
Selecting the modeling algorithm
Parameter tuning through cross validation
Building the model
Checking the results by making a submission

Once you’ve executed these 7 steps, a basic framework will be ready to do more experimentation. Further, you can concentrate more on:

Feature engineering – This is where bigger improvements come from most of the times
Building varied kind of models and ensembling them – This will help go that extra mile towards the end

Last but not the least, we must perform a solid local validation. Else, we might end up over fitting on the public leader board.

Tips from SRK:

1. Understanding the problem – It is really important to have a thorough understanding of the problem that we are trying to solve. Only after we’ve understood the problem clearly, we can derive suitable insights from data to tackle the problem and obtain good results.

2. Structured Thinking – It’s a unique way of thinking through the problems. Being a data scientist, one needs to be more structured in his/her thinking in order to obtain good results.

3. Effective communication of results – Effective communication of derived results is as important as performing the data analysis.

Read detailed article here

Rohan Rao

2. Rohan Rao, Lead Data Scientist, AV Rank 4

His approach in past competitions:

Understand the problem / objective you are trying to solve.
Understand and summarize what data you have / need.
Carefully read about the evaluation metric.
Explore and visualize the data, build simple, base models for benchmark.
Setup a robust / thorough validation framework consistent with the evaluation conditions.
Work on feature engineering and optimizing algorithms.
Try out as many different models / ideas as you can.
Ensemble / Blend / Stack multiple models.
Never hesitate in asking questions, taking help or even teaming up with others.

Tips from Rohan:

Gauge the complexity of the problem: Explore the data as much as possible. Plot features, summarize columns, build benchmark models, and during the process, get a sense of the problem, data, time, complexity, etc. And then slowly build a good solid concrete solution by working on one idea after another.
Algorithm: I use XGBoost and feature engineering for building ML solutions and it’s been a part of my winning solution for most of the contests I’ve done well in, so a big thanks to the community who are actively developing and improving it each day. I also like Collaborative Filtering techniques, which I’ve implemented very often in my work.
Feature Selection Ways: My thumb rule of feature selection is based on CV or Val scores. If selecting a feature improves CV score, I use it, else discard. For large number of features, I usually build small quick models and check variable importance or information gain, and select the top-x from them.

Read detailed article here

Steve Donoho

3. Steve Donoho, Top Data Scientist

His approach in past competitions:

Well, I start by simply familiarizing myself with the data. I plot histograms and scatter plots of the various variables and see how they are correlated with the dependent variable. I sometimes run an algorithm like GBM or Random Forest on all the variables simply to get a ranking of variable importance.
I usually start very simple and work my way toward more complex if necessary. My first few submissions are usually just “baseline” submissions of extremely simple models – like “guess the average” or “guess the average segmented by variable X.” These are simply to establish what is possible with very simple models. You’d be surprised that you can sometimes come very close to the score of someone doing something very complex by just using a simple model.

Tips from Steve:

Making Predictions: This is an important step that is often missed by many – they just throw the raw dependent variable into their favorite algorithm and hope for the best. But sometimes you want to create a derived dependent variable.
I probably spend 50% of my time on data exploration and cleansing depending on the problem.

Read detailed article here

Go on & use these tips from the winners and grab your first win in Knocktober.

EndNotes

I am sure these approaches and tips will provide you an upper hand in the competition. Learn and improve from these tips. If you want to register for the ongoing competition, click here. I recommend that you atleast register for Knocktober and explore the problem statement. I assure you it will be a great learning experience for you.

Did you find this article? Do you have any questions? Post them in the comments below and let me know how I can help you further. Also, I would like to hear, if you have any feedback for us.

You can test your skills and knowledge. Check out Live Competitions and compete with best Data Scientists from all over the world.

Kunal Jain

Kunal Jain is the Founder and CEO of Analytics Vidhya, one of the world's leading communities of Al professionals. With over 17 years of experience in the field, Kunal has been instrumental in shaping the global Al landscape. His expertise spans diverse markets, from developed economies like the UK to emerging ones like India, where he has successfully led and delivered complex data-driven solutions. As a recognized thought leader, Kunal has empowered countless individuals to realize their Al ambitions through his visionary approach to Al education and community building. Before founding Analytics Vidhya, Kunal earned both his undergraduate and postgraduate degrees from IIT Bombay and held key roles at Capital One and Aviva Life Insurance across multiple geographies. His passion lies at the intersection of analytics, Al, and fostering a thriving community of data science professionals.

Free Courses

AI Interview Questions & Answers Masterclass

Master AI interview questions with expert answers.

4.5

Model Deployment using FastAPI; Prepare, Train, and Test FastAPI Application

Deploy a fastapi machine learning model with XGBoost and Docker APIs.

Build Data Pipelines with Apache Airflow

Learn ETL pipeline building and workflow orchestration with Airflow.

4.6

Evaluation Metrics for Machine Learning Models

This course covers evaluation metrics to improve ML model performance.

4.8

The A to Z of Unsupervised Machine Learning

Learn Unsupervised ML & DBSCAN with real-world applications.

Examhelpline

Thank You for such amazing information. This is one of the best sites which provide all the details of related topics. Examhelpline.in

geometry dash

Really a great addition. I have read this marvelous post. Thanks for sharing information about it. I really like that. Thanks so lot for your convene.

Reading list

Winning Strategies for ML Competitions from Past Winners

Introduction

1.Sudalai Rajkumar (SRK) , Senior Data Scientist, AV Rank 1

2. Rohan Rao, Lead Data Scientist, AV Rank 4

3. Steve Donoho, Top Data Scientist

EndNotes

You can test your skills and knowledge. Check out Live Competitions and compete with best Data Scientists from all over the world.

Login to continue reading and enjoy expert-curated content.

Free Courses

AI Interview Questions & Answers Masterclass

Model Deployment using FastAPI; Prepare, Train, and Test FastAPI Application

Build Data Pipelines with Apache Airflow

Evaluation Metrics for Machine Learning Models

The A to Z of Unsupervised Machine Learning

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

Winning Strategies for ML Competitions from Past Winners

Introduction

1.Sudalai Rajkumar (SRK) , Senior Data Scientist, AV Rank 1

2. Rohan Rao, Lead Data Scientist, AV Rank 4

3. Steve Donoho, Top Data Scientist

EndNotes

You can test your skills and knowledge. Check out Live Competitions and compete with best Data Scientists from all over the world.

Login to continue reading and enjoy expert-curated content.

Free Courses

AI Interview Questions & Answers Masterclass

Model Deployment using FastAPI; Prepare, Train, and Test FastAPI Application

Build Data Pipelines with Apache Airflow

Evaluation Metrics for Machine Learning Models

The A to Z of Unsupervised Machine Learning

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques