“Ignore the gatekeepers who expect you to have a Ph.D. A relevant study can be more useful than a Ph.D.” – Ahmet Erdem
Golden words that every data science aspirant needs to learn by heart.
We see all sorts of jumbled job descriptions in data science expecting people to touch the sky in terms of their educational background. You need a Ph.D. or you need a Masters in Applied Mathematics, etc.
As someone who just wants to land their first role in data science, this can appear daunting and demotivating. So these are very timely and relevant thoughts by Ahmet Erdem.
Yes, we are delighted to share our second interview of the Kaggle Grandmaster Series with Ahmet Erdem today!
Ahmet is a Kaggle Competitions Grandmaster who currently ranks #8 – right up there in the upper echelons of Kaggle. He has won 12 gold medals and 15 silver medals in the competitions category – a remarkable achievement.
Also, he is a Kaggle Master in Notebooks and Discussions. Ahmet currently works as a Senior Data Scientist at NVIDIA and brings many years of experience across diverse firms to give you an insight into the power of data science and NLP. Also, he has a Masters’s Degree in Artificial Intelligence from KU Leuven University.
This is the second interview in the series of Kaggle Grandmaster Interviews. You can read the first interview here:
In this interview, we cover a range of topics, including:
- Ahmet Erdem’s Transition from Software Engineer to Data Science
- Ahmet’s NLP Journey and his advice to NLP enthusiasts
- Erdem’s Kaggle Journey from Scratch to becoming a Grandmaster
So, go through this interview and absorb all you can!
Ahmet Erdem’s Transition from Software Engineer to Data Science
Analytics Vidhya (AV): You switched from Software Engineering to Data Science. That’s a journey a lot of people are trying to make these days, especially in our Analytics Vidhya community. How did you manage to overcome the obstacles to make this transition?
Ahmet Erdem (AE): Actually, the transition from Software Engineering to Data Science is not a big challenge.
“Job definition for Data Scientists varies a lot but I believe a Data Scientist should be good at math/statistics and programming. By studying Computer Science, one can already excel in Math and Programming.”
So what is left is Statistics and eventually Machine Learning. In order to get this missing part, I decided to do a master’s in Artificial Intelligence. There is no single path, self-learning is also possible, but I think this educational background made the difference for me.
AV: That’s quite an interesting choice. What are 5 key points or best practices you would recommend to anyone who wants to switch from software engineering to data science?
AE: 5 key points I can recommend to anyone who wants to switch from software engineering to data science:
- Learn the theories behind the machine learning algorithms
- Get ready for switching from well-defined tasks to open-ended tasks
- Gain some soft skills, half of Data Science is about convincing people that your model works
- Ignore the gatekeepers who expect you to have a Ph.D. The relevant study is more important than Ph.D
- Keep using the software development practices (versioning/linting etc.)
Ahmet’s Natural Language Processing (NLP) Journey
AV: You’ve worked extensively on NLP problems in your career. And NLP is a thriving field right now with incredible state-of-the-art models being released seemingly every week! How do you manage to keep up to date with these advances?
AE: My NLP problems at work were usually unsupervised and the data was big. Therefore, the inference time was more important than accuracy. It was more of an engineering challenge and simpler models were more useful for us. But I was interested in advanced models too. And Kaggle was the best place to keep up with all these advanced models.
When I joined Kaggle, LSTMs was something new. I had studied them and practiced them as much as I could. And suddenly, they became outdated thanks to Transformers. Again, Kaggle gave me all the opportunities to get familiar with Transformers.
AV: Where do you see NLP heading in the next 2-3 years? The rate of advancement has been staggering so far – we would love to hear your thoughts on where you see the NLP trend going.
AE: I believe the only challenge for NLP is the available data and compute power. If we compare current NLP models with us (humans), we have a huge time advantage. Our brains are tuned by evolution for millions of years. And they are being trained every second with everything we experience. Imagine training an NLP model for 5 years with a lot of diverse data.
“What I am trying to say is that I believe consciousness is not binary.”
And even current NLP models have it but by the time they will come to some point that turning them off will feel like you are killing someone. But with the current data privacy policies and available compute power, it will definitely take more than 2-3 years.
Ahmet’s Kaggle Journey from Scratch to becoming a Grandmaster
AV: You’re a Competition Grandmaster with a current rank of 8. Can you pinpoint 3 competitions or milestones in your journey?
AE: Three competitions which were milestones for me:
- Quora Question Pairs: It was my first competition. It was a very interesting problem and I learned a lot from Kernels and Discussions. And surprisingly I was almost getting a solo gold. I didn’t expect that!
- Favorita Grocery Sales Forecasting: It was the first and only time I earned the prize money. I had a great teaming experience and learned a lot from my teammate. I have noticed the power of neural networks on time series problems thanks to him. The gold medal from this competition made me a Kaggle Master
- PLAsTiCC Astronomical Classification: It was my first solo gold. I worked for this competition like a one-man team. I set my tasks and track them, logged my experiments, and used Github for versioning. I have applied different tricks that helped me end up in 4th place
AV: What framework do you follow for each hackathon? Do you have a set of steps you keep in mind before you look at the problem statement?
AE: Yes! Here are the typical steps I follow when participating in a Kaggle Competition:
- Understand the problem and the metric
- Try to come up with unique ideas
- Set a validation scheme using the most basic model
- Iteratively add features and model complexity
- Log every experiment
- Try to understand the positive or negative effect of each experiment and design the next experiment
AV: For anyone starting out with data science hackathons now and given how fierce the competition is, what techniques should these beginners focus on to improve their chances of rising up the leaderboard?
“A beginner should focus on learning instead of leaderboards.”
If they get obsessed with their ranking, it may block their learning. Reading other people’s ideas and coming up with their own ideas, and then implementing them is the key to learning.
Maybe 90% of these ideas will not work but the gained knowledge will help them in the next competitions. Otherwise tuning public kernel parameters for achieving a higher score is a total waste of time. So, the goal should not be ranking 71st today, the goal should be ranking first in the future.
This was an amazing interview packed with pearls of wisdom by a Kaggle Grandmaster.
Ahmet’s precise and to the point answers and the ultimate focus on learning is something we can all absorb from this interview. I hope this interview will help you to set your course right and rise up your data science learning aptitude!
This is the second interview in the series of Kaggle Grandmaster Interviews. You can read the first interview here-
Let us know in the comments if you have any other questions that you think we missed. You can also drop any questions you feel you want to ask a future interviewee – we’d love to focus on your thoughts as well!You can also read this article on our Mobile APP