Kaggle Grandmaster Series – Exclusive Interview with Kaggle Competitions Grandmaster Peter Pesti (Rank 23!)
“My transition from SWE to Data Science/AI is still not complete; I am working on it every day.”
Being a Kaggle Grandmaster in any category is a function of daily practice. Such iteration can only ensure the sharpening of your skills and make you industry-ready.
In the 13th edition of the Kaggle Grandmaster Series, we have Peter Pesti joining us.
Peter is a Kaggle Notebooks Grandmaster and currently ranks 23rd with 15 gold medals to his name. He also holds a Master title in the Discussion category and an Expert title in the competitions category.
Further, Peter completed his Master’s in Computer Engineering from Veszprémi Egyetem. For the past 4 years, he has been a Self-Employed software engineer.
You can go through the previous Kaggle Grandmaster Series Interviews here.
In this interview, we cover a range of topics, including:
- Peter’s Education and Work
- Peter’s Kaggle Journey from Scratch to becoming a Kaggle Grandmaster
- Peter’s Advice for Beginners in Data Science
- Peter’s Inspirations
Peter’s Education and Work
Analytics Vidhya: You’re a Senior Software Engineer by profession so please tell us how did your interest in Data Science and Machine Learning grow? Which sources/tools helped you in learning Machine Learning?
Peter Pesti: In the past 18-20 years, I have worked as a Software Engineer/Software Developer for many different companies. It was a great 20 years; I have worked with many excellent developers/engineers, and I have learned a lot during these years. We’ve made various software from small mobile applications to large scale ERP systems.
Honestly, I worked too much. About 4-5 years ago, there were periods when I sat in front of my monitor 13-14 hours every day. Not surprisingly, I burned out. I knew it was time to take a break and do something else.
At that time, I read about AlphaGo Zero. After I read that article, I was amazed at the level of A.I. The same day I bought my first Machine Learning Course on Udemy. I did not know what I was jumping into. The first course led to another, etc. Since then, I’ve learned a lot about ML/DL/AI. I’ve finished many online courses on Udemy, Udacity, Coursera, Edx.
My transition from SWE to Data Science/AI is still not complete; I am working on it every day.
Peter’s Kaggle Journey from Scratch to becoming a Kaggle Grandmaster
AV: You’re one of the 36 Kaggle Kernel Grandmasters, That’s amazing! Can you list down the challenges you faced and how did you overcome them? What are the other possible challenges that one might encounter while creating any notebook?
PP; I think there are two main types of challenges when you want to write successful notebooks.
The first is a technical one. You have to write the code. When I wrote my first Kaggle Kernel, I made lots of rookie mistakes even though I had 20 years of programming experience. At first, it takes more time to write a simple notebook. You will make bugs; it will hard to read; the formatting will be messy, etc. If you don’t give up and practice, you’ll be better and better with time. You will have more and more reusable code, and eventually, you will be able to publish GM level kernel within an hour.
For me, the more difficult challenge was a non-technical one. When I first published it, I thought I wrote an excellent notebook, and I got only a few votes. Timing and a bit of marketing are as much important as the code itself. For example, if you write an EDA kernel for a competition weeks before the end, you won’t get any votes.
The easiest way to get enough votes is during competition. On the other hand, your competition for those votes is much higher. After my first few failures, I looked for ideas; what type of notebooks achieved gold level? EDA, starter kernels, etc. So I wrote an EDA, which I published within hours after the competition started. The notebook was great (in my opinion), but SRK posted his own EDA (much better than mine, of course). As a beginner, you can’t compete with a GM, even if your work’s quality is identical. The reason is simple: most of the readers will read the GM’s notebook, not yours. I was aiming for the votes and a gold medal; that was also a mistake. I should have looked for bronze kernels. It is much easier to get five votes for a bronze medal.
After these failures, I wrote an explanation kernel for some scoring metric Kaggle used for an ongoing competition. There was a bit of a misunderstanding about how they calculated. I thought it would be useful for others too, I did not expect that kind of success. I gave more than 100 votes and my first gold medal.
AV: Among your kernels/notebooks, which ones would you recommend for beginners?
PP: EDA Kernels. In my opinion, data analyzing skills are a must for every Data Scientist. But before that, every beginner needs good coding skills. You don’t have to be a Python expert, but you will need a bit of confidence to write the necessary python codes. You will also need to learn a few libraries, like Numpy, Pandas, and a plotting tool, Matplotlib or Plot.ly, for example. After you have these basics, you can start writing your own EDA notebooks.
AV: Since you’re a Discussion master as well, which discussion threads would you recommend for the beginners?
PP: Most of the shared content on the discussion forums are related to some competition. For a beginner, for educational purposes, these topics are not useful. One exception may be the write-up topics. Lots of teams share their solution after the competition ends.
I think online courses are much more valuable for a beginner. There are many MOOC and e-learning sites from which you can choose. Udemy, Coursera, Udacity, Edx, KhanAcademy, Kaggle, and Analytics Vidhya, to name a few; Infinite possibilities.
On the other hand, if you participate in a competition, the discussion forum is a must. When I enter a new competition, I subscribe to the discussions, and I read all of the topics and comments from day one to the end.
AV: Please share your check-list/approach to build an expert level notebook.
- I have lots of reusable code for various notebooks (EDA, Starter kernel, metric explanations, etc.) I usually copy-paste all of it and start to adjust it to the given problem.
- After I have a quick draft, I add some extra if it is necessary. For example, EDA notebooks are usually different because of different data. Most of the time, I can add a few additional plots to the notebook.
- Next, I always test the code. I am looking for bugs or miscalculations.
- After everything is working, I clean the code and write the documentation (code and explanation; why-s and how-s).
- After I commit the Kaggle Notebook, I always check its output. There still could be errors or formatting issues. If there is anything that I don’t like, I fix it before I publish it.
- After I publish, many times, I write a topic on the discussion forum. It is a bit dangerous; The community does not like self-promoting.
AV: What tips would you give to the beginners who want to attain higher ranks in the Notebooks category?
- Quality over speed. As a beginner, you won’t be able to compete with GrandMasters. Starter kernels and EDA-s are the most popular ones, but you have to publish fast. I’ve seen many notebooks from beginners with a few plots and a big text: “Stay tuned, I’ll update this notebook.” Don’t do that.
- Do not copy/fork other people’s work. If you want to build your reputation, these kinds of shortcuts won’t work. If you are not the author of at least 85-90% of the notebook, you should not publish it.
- You should publish only clean, well-formatted, well-documented codes. It clearly shows that you care about your work.
- Prepare your code months before publishing. For example, I made an Object Detection training kernel three months before I published it. I shared it hours after the competition started, and it achieved a gold medal within days.
- Be patient and consistent. If you work hard and don’t give up, you’ll be Kernel’s Grand Master in no time.
Peter’s Advice to the Beginners
AV: Since Data Science is a vast field, how do you keep up with all the advancements that are happening in the industry?
PP: It is challenging to keep up with everything; so much is happening in the industry every day. I think the only way if you choose a small part and focus on that. After I learned the basics in many areas (Computer Vision, NLP, Time series, Reinforcement Learning, etc.) I started to focus on computer vision. I read a lot about this topic every day. If I have time, I read some new papers on arxiv. Besides my job, Kaggle, learning, and my personal life, I don’t have much time left, but I do my best. I keep hundreds of open tabs on chrome 🙂
AV: You can ask, as an experienced industry leader, how often do you see DL being applied to a problem and what future trends do you foresee in this regard?
PP: Applying deep learning is still rare, at least in Hungary. Slowly, but it seems it started something. The government takes this industry more seriously. They funded a few exciting projects. Bigger companies posted more and more open positions.
It is hard to predict, but I think the change will be dramatic in the next 10-20 years.
AV: As an industry-leader in DS, what advice would you give to beginners so that they can excel in the industry?
PP: Learn, learn, learn. This industry is changing so fast, so what you’ve learned at a university won’t be enough five or ten years from now.
Building a professional online portfolio is a great way to show your knowledge. Be active on Kaggle; your goal should be at least one GM title. Three is better 🙂 Open source coding is also a good starting point.
AV: Who are the Data Science experts whose work you look forward to?
Outside of Kaggle, I am trying to read as many blogs and articles as I could. I read everything from Google Brains, Facebook AI, OpenAI, Uber. Occasionally, when I have a bit more time, I read interesting, new arxiv publications.
Peter’s journey is a testament to the fact that you have to work on your knowledge and base every day to be a data scientist. I hope this interview gives you some important lessons for you to apply in your personal journey.
This is the 13th interview in the Kaggle Grandmasters Series. You can read the previous few in the following links-
- Kaggle Grandmaster Series – Exclusive Interview with 2x Kaggle Grandmaster Prashant Banerjee
What did you learn from this interview? Are there other data science leaders you would want us to interview for the Kaggle Grandmaster Series? Let me know in the comments section below!