DataHack Radio #16: Kaggle Grandmaster SRK’s Journey and Advice for Data Science Competitions
Ladies and gentlemen, presenting Analytics Vidhya’s top community member and Kaggle Grandmaster – Sudalai Rajkumar, aka SRK!
He is an inspiration and enigma in the data science community. I had the pleasure of meeting him at DataHack Summit 2018 and was blown away by his humility and intelligence. I jumped at the first chance of getting him on our DataHack podcast.
Whether it’s his analytical acumen, wealth of experience, or down-to-earth demeanour, we could all learn something from SRK. In this episode, we cover:
- SRK’s background and his first brush with analytics
- His transition into data science
- The advantages of participating in data science competitions
- How competitions help in a day-to-day industry role
- Advice to aspiring data scientists from a Kaggle Grandmaster
And much, much more!
I have picked out the highlights of SRK’s discussion with Kunal in this article. Happy listening!
DataHack Radio is now on all the popular podcast platforms. Subscribe today and listen to this and all previous episodes below:
“Finding patterns in data is something I love to do.”
SRK’s background and journey into data science wasn’t a straightforward one. He completed his B.E. in Mechanical Engineering from PSG College of Engineering in 2010. During his final year there, he received two offers – one from the mechnical field and one from an analytics company.
Remember, this was back when the term “data science” hadn’t been coined yet. It was plain old analytics. After weighing up both offers, SRK went with the analytics option. Working with data and finding patterns piqued his curiosity and that led to his first foray into this field.
But he didn’t start working on an analytics project straight off the bat. His was more of a software engineering role (quite a lot of our community will relate to that!). SRK’s responsibilities included writing codes to put models built by analysts into production. He worked with Python at that time, which, as he admits, was a lucky break in hindsight.
We might be familiar with pandas and numpy nowadays, but back then SRK worked exclusively with native Python functions (lists, tuples, dictionaries, etc.).
Transition from Software Engineering to Data Science
A year in the job is enough to ask the question – where do I go from here? Most of us have been there and SRK faced up to it in 2011. He got a switch within the same organization and moved into an analytics role.
Which tools were considered state-of-the-art at that point? The experienced readers among you will recall that Python wasn’t close to what it is now. Quite a lot of the packages we use these days did not exist eight years ago. Instead, R was primarily used to perform the exploratory analysis and build models. Of course, SAS also had a big market share. Deployment and production was done in Python. A neat segregation of tasks, right?
During his role transition, SRK worked with R for the first 1 year or so. He then switched to Python for the modeling part of his job as well.
His work was majorly in the financial domain at that point. He worked with different kinds of credit risk models, weight risk models, marketing models, etc. Initially his team worked with linear and logistic regression models. But the focus slowly shifted to more advanced machine learning techniques, like random forest and GBMs.
This was the period when deep learning had started to penetrate into mainstream applications. Given his penchant for patterns and trying new things, SRK started experimenting with a few deep learning models using libraries like Theano and PyBrain.
Participating in Data Science Competitions
During his first year in the analytics project, SRK started reading and learning about analytics outside his working environment. He’s candid enough to admit that at one point, he thought analytics was only about linear and logistic regression! 🙂 Goes to show how big a role self-learning plays in our life.
He was exposed to the world of machine learning algorithms during his learning phase. But his day-to-day role didn’t involve anything outside building linear models. That’s when SRK discovered the usefulness of data science competitions.
Participating in these competitions was far more difficult then as compared to now. There were no blogs or MOOCs teaching one how to crack competitions. SRK relied on a library’s official documentation page and his own analytical acumen to figure out the way forward. It did take a lot of time to experiment and learn but it helped him immensely in his learning.
But here’s where the best are separated from the rest – SRK spent, on average, 3-5 hours every day after work to learn new concepts or brush up old ones. That is the level of hard work we need to put in to succeed in this field.
Does Participating in Data Science Competitions Help in an Industry Role?
We’ve heard this question come up plenty of times. Who better to answer it than a Kaggle Grandmaster?
SRK’s first role in analytics was in the R&D side of things. So he had leeway to experiment with new algorithms. The knowledge and experience he gained through data science competitions came in handy for him here. Instead of learning on the job, it became more about experimenting on the job. Something we all can strive for more!
Admittedly, not all of us will get the chance to play around with new algorithms in our day-to-day roles. But the learning we gain through these hackathons is invaluable. It isn’t limited to just applying new techniques. Critical thinking, brainstorming, working with numbers (data intuition), structured thinking, ability to experiment – these are characteristics that will help you in making business decisions. You might not see it happening straightaway but patience is key to success, especially in data science.
The Secret Behind Making the First Submission within Hours of a Competition Launch
This has always intrigued me. Whenever a new competition page goes up on Analytics Vidhya or Kaggle, SRK usually has his first submission ready within a few hours. How is that even possible?
As it turns out, most folks who regularly participate in these competitions have a generic code base ready before the dataset is released. They just modify the code based on the problem statement. The hyperparameter tuning and all other experiments are done post that.
Advice to Aspiring Data Scientists
- Get Practical Experience: Knowing the theory and intuition behind algorithms is good, but getting hands-on practical experience is where the goldmine lies. Try to get your hands on a real-world dataset. See what you can wring out of it. This will be invaluable when you are sitting in an interview setting
- Participate in Hackathons: Participating in competitions helps you understand where you stand among the community
- Try to Frame a Problem Statement on your own: I really liked this advice. We don’t get industry experience in a hackathon setting. Instead, you can try to come up with a problem that you feel might help someone. Then build on that by collecting data around it. Solve the problem and showcase it via blogs or GitHub. It’s hard work but that translates to results sooner than you might expect
- Pick a Domain: This is critical. So many aspiring data scientists aimlessly wander about applying to jobs in domains where they don’t hold any experience (or even interest). Pick a domain that is of interest to you and try to find datasets to work on
Acing data science competitions is a tough proposition. I have struggled to crack the top 10 in any popular competition I’ve taken part in. SRK gave us a lot of practical advice in this episode which should help you streamline your learning process.
What was your favorite part about this episode? And if you’ve ever had the fortune of meeting SRK, I would love to know your experience.