Kaggle Grandmaster Series – Exclusive Interview with Kaggle Quadruple/4x Grandmaster Rohan Rao
Ever seen a data scientist with more than 2 Kaggle Grandmaster titles? Well, don’t stop at 3 now as we bring to you the thoughts of a Quadruple Kaggle Grandmaster!!
In the 20th edition of the Kaggle Grandmaster Series, we are honored to be joined by Quadruple Kaggle Grandmaster- Rohan Rao.
Rohan ranks 100th in Kaggle Competitions, 6th in Datasets, 12th in Notebooks, and 12th in Kaggle Discussion category with 8,8,15 and 56 gold medals to his name respectively.
Rohan currently works as a Data Scientist at H2O.ai. He has a Masters Degree in Applied Statistics from IIT Bombay. He is also a 17-time National Sudoku/Puzzle Champion
You can go through the previous Kaggle Grandmaster Series Interviews here.
In this interview, we cover a range of topics, including:
- Rohan’s Education and Work
- Rohan’s Kaggle Journey
- Rohan’s Advice to Beginners in Data Science
- Rohan’s Inspiration
So without any further ado. Let’s begin.
Rohan’s Education and Work
Analytics Vidhya(AV): You hold your Master’s degree in Applied statistics. How has your educational background in Applied Statistics helped you in your professional career as well as in Data Science hackathons/competitions?
Rohan Rao(RR): Mathematics & Statistics form the roots of many data science workflows and machine learning algorithms. Having a solid understanding of the fundamentals helped me learn and grow faster, professionally as well as competitively.
The Master’s degree and experience gave me more exposure to the field and also opened up a lot of opportunities in different industries.
AV: Currently, you’re a Data Scientist at H2O.ai. So what is your day-to-day job role there? Also, what tips would you like to give everyone to land the position of Data Scientist at some good company?
RR: I work in between our products and customers, constantly engaging and helping users of our platform while building and enhancing our products in parallel. I enjoy this kind of role because it gives me an opportunity to contribute to different aspects of the business.
It’s important to know and understand the exact role of a Data Scientist because it can vary widely across different companies and then prepare and apply for the ones that best suit your skillset and interests.
Rohan’s Kaggle Journey from Scratch
AV: You’ve recently become Kaggle Quadruple Grandmaster, a big congratulations on this achievement!! You’re one of three Kaggle Quadruple Grandmasters and now we have this privilege to ask questions from someone who has aced all the four categories of Kaggle i.e Competitions, Datasets, Notebooks, and Discussions. So please tell us in detail about the challenges you faced in each aspect and also how did you overcome them?
RR: Thank You! It has been one of the most exhilarating experiences of my life and I’m glad I could accomplish this feat.
Competitions are the only category that I find having a meaningful ranking & points system while the other three are more for learning and networking. It’s also the primary reason why I enjoy spending time on Kaggle. What helped me most here was to try out and work on as many different types of competitions as time permitted in my early days and those experiences helped me accrue some successful results in subsequent competitions.
Datasets were the hardest category to reach the KGM tier. What helped me was to constantly work on dataset ideas that would be useful for projects or competitions and publish extremely clean datasets that users can use conveniently.
Notebook is a great category to showcase novel visualizations or models or pipelines. I don’t like writing EDA notebooks much so most of my efforts have gone into interesting model pipelines or utility scripts or tutorials.
Discussions are fun and easy for me because I enjoy writing ☺
AV: What was the order of getting your Grandmaster titles? Which was the hardest to achieve and why?
RR: Competitions (in 2016), followed by Notebooks, Datasets, and Discussions in order (all in 2020).
Datasets were the hardest and most likely due to lower activity and visibility compared to the other three categories on the platform. So it involved more effort and investment of time.
AV: What is your strategy/framework to tackle any ML problem given in any competition?
RR: Start with the basics and tweak as you go.
Spend the majority of the time exploring the data. Read every discussion and scan every notebook. Be open to approaches. Often the craziest of ideas can help you rank well.
Team up to learn more and ensemble better, it is often under-rated.
AV: Can you tell us about your top three favorite competitions/notebooks so far that have shaped your Kaggle journey? And what were the challenges you faced particularly while coming up with their solutions?
RR: ASHRAE competition is my most memorable competition as we had a great team and it was my first prize-winning competition on Kaggle.
Large Datasets Tutorial notebook is my most popular notebook and in fact, I had deleted that notebook because I did not like it. Fortunately, I had shared the notebook with a couple of friends who gave me extremely positive feedback and convinced me to republish it.
Santander Product Recommendation was the competition that gave me my 5th gold medal to become a Competitions Grandmaster. My team-mate, Sudalai, and I were struggling to make our ensemble work until we chose different weights for different rows, some of them not even summing to 1. Non-sensical idea but worked.
AV: Since you’ve earned 8 gold medals in Datasets, please tell us what is your whole procedure for creating a dataset? And according to you how much data is enough for a good dataset?
- Pick a dataset that is new to Kaggle and hard to compile. That’s where you create value that users will appreciate.
- Publish the cleanest format of the data and maintain it regularly.
- Document as many details as possible and share as much information as you have.
- Publish an introductory notebook showcasing what the dataset contains and how it can be used.
AV: What are the three things that one should keep in mind if he/she wants to achieve a higher level in all the four categories- Competitions, Datasets, Notebooks, and Discussions ranking respectively?
- Be open to learning new things, facing criticism, and trying out as many ideas as possible.
- Focus on quality over quantity.
- Don’t cheat, don’t spam
Rohan’s Advice to the Beginners in Data Science
AV: Can you suggest some competitions that you think are best for beginners to try their hands on, especially for tackling specific tasks like regression classification, NLP-related tasks?
RR: Most of the knowledge competitions on Kaggle are specifically meant for beginners to get started and they are the best place to start and learn stuff.
AV: Do you prefer to use a specific rig at your end to solve DL problems or you prefer using tools like Kaggle or Google Colab?
RR: I use a combination of both. I try to make the maximum use of freely available resources before moving to my personal servers (which are sometimes costly).
AV: Can you tell us some lesser-known yet very helpful ML and DL frameworks/libraries according to you?
RR: python-datatable is a good alternative to pandas especially while working on large datasets with limited resources.
Recently I quite enjoyed using PyTorch in R, it’s a great wrapper for DL-practitioners who want to stick to R.
AV: How do you keep yourself updated with the latest technologies/framework related to DL and ML? And how do you use them in the industry as well as in the hackathons?
RR: Kaggle is the best place to keep oneself acquainted with all the latest happenings in ML/DL and ArXiv for reading research papers. The more you spend time working on these hands-on, the better you get at it and the easier it becomes to apply them to hackathons and real-world industry problems.
It was a pleasure for us to interview such a multi-talented person. His thoughts and words are enough to get anyone to begin and stay focused on their data science journey.
This is the 20th interview in the Kaggle Grandmasters Series. You can read the previous few in the following links-
- Kaggle Grandmaster Series – Exclusive Interview with Kaggle Datasets Grandmaster Ruchi Bhatia(#Rank 5)
What did you learn from this interview? Are there other data science leaders you would want us to interview for the Kaggle Grandmaster Series? Let me know in the comments section below!