Kaggle Grandmaster Series – Exclusive Interview with Kaggle Notebooks Grandmaster Kostiantyn Isaienkov (#Rank 8)
“I think the main challenge for a lot of beginner kagglers is to overcome their fear of failure and criticism”- Kostiantyn Isaienkov
The statement is very relatable when we observe this in the data science community. A little criticism is enough for the fear to proliferate within us and stop us from improving on our mistakes we make in our kaggle journey.
To help you push further, in the 21st edition of the Kaggle Grandmaster Series we have with us- Kostiantyn Isaienkov.
Kostiantyn is a Kaggle Notebooks Grandmaster. He ranks 8th in this category and has 17 gold medals to his name. Kostiantyn is also an Expert in the Kaggle Competitions category.
Kostiantyn has a Master’s degree in Computer Science from Donetsk National University. He currently works as a Data Science Engineer at Quantum_Inc.
You can go through the previous Kaggle Grandmaster Series Interviews here.
In this interview, we cover a range of topics, including:
- Kostiantyn’s Education and Work
- Kostiantyn’s Kaggle Journey
- Kostiantyn’s Advice to Beginners in Data Science
So without any further ado. Let’s begin.
Kostiantyn’s Education and Work
Analytics Vidhya(AV): Your education is in the field of Computer Science. It is a popular opinion that having a Master’s degree is mandatory in the field of DS. How has your Master’s degree contributed to your career and your Kaggle journey?
Kostiantyn Isaienkov(KI): According to my experience I can’t say that a Master’s degree is mandatory for a career in Data Science. I know a lot of people who have a Bachelor’s/Master’s degree or Ph.D. in Computer Science and related fields. Also, I know people who don’t have a degree in this field.
Despite this, I think that a university degree is quite important for a future career. During the study, we usually have a great opportunity to train discipline, communication skills, the ability to work on deadlines, solve complex problems, and the most important – the ability to search for important information and practice self-education.
Regarding the contribution of my degree to my career – it is definitely positive. During the exchange study at Czech Technical University, I obtained a strong knowledge base in the field of Machine Learning and Autonomous robotics that helped me a lot to get my first Data Science job.
Regarding the contribution to Kaggle – I think that these are two parallel terms. Regardless of the specialist’s title and the number of his medals, the Kaggle is a platform where you can always find something new or participate in a competition in a new field for yourself. Thus, Kaggle is an additional source for obtaining knowledge and a university degree has no effect on your success on this platform.
AV: Can you describe your role as a Data Science Engineer at Akvelon and Quantum_Inc respectively? Does the job role vary from company to company?
KI: In both companies, the main responsibility is solving problems in the field of data science for various clients. The main difference is the direction of the tasks being solved. In Akvelon the majority of the tasks were associated with classical machine learning on tabular data. Also, I worked a little bit as a data engineer.
In Quantum I have a chance to work with different directions such as classical machine learning, computer vision, and even NLP projects. During the work here I had the opportunity to implement ML algorithms in Java, integrate my solutions on an iPhone, and participate in product development in the company. In a couple of words about the product – this is an auto machine learning solution for a wide range of tasks. For more information, you can check maister.io.
I also lead the internship program in the company. Quantum is serious about preparing young specialists and has already successfully completed several internship programs in Data Science.
Another area of work in the company is research activities in the field of processing satellite images. One of the most recent results of the work is a published paper for change detection in the forests of Ukraine where I took an active part.
Kostiantyn’s Kaggle Journey from Scratch
AV: You’re Kaggle Notebooks Grandmaster and currently ranked 8th. This is really praiseworthy! What were the challenges you faced during this journey, especially when you created your first kernel/notebook and how did you overcome them?
KI: I created my first kernel during the participation in my first competition about 5 years ago. I think the main challenge for a lot of beginner kagglers is to overcome your fear of failure and criticism. When I worked on my first notebook I thought that I would publish it and obtain a lot of negative comments under the code. This problem went away by itself with the growth of experience and activity on Kaggle.
Secondly, the problem that I faced is to create a clear and readable kernel. You can even create an awesome kernel with unique content but it will not be popular because your code is not good and you can’t create a good description for your work. You should always remember that the Data Science solution is not only a model training process.
And of course, the common problem for beginners in the field of Kaggle notebooks is the lack of votes on your kernel. It can often be a reason to stop notebook creations for some beginners because usually, people want to get rapid progress. Unfortunately to solve this problem you need to spend some time on the creation of good notebooks and develop your own style for them. Sometimes it can take more time than you expect.
AV: What are your criteria for selecting a competition/topic for a new kernel? What are other platforms would you recommend for DS beginners to participate in and practice their skills?
KI: I still use kaggle mainly for self-education. So I am trying to select competitions or datasets based on the field of the problem. I am trying to select the most interesting for me or those where my experience is not really high. Sometimes it helps a lot in my working process. For example, there is some new project in the field where no one in the company has experience. But you worked on a similar problem on kaggle and have knowledge at least in the current domain. It’s really cool and useful.
To be honest I spend so much free time on the kaggle that I don’t have to think about other competition platforms. Nevertheless, I participated in several competitions on zindi.africa and Analytics Vidhya. Can’t say that I spent a lot of time there but it was an interesting experience for me.
AV: What is your procedure for creating a good notebook after selecting the dataset? Is there a check-list of must-do tasks you always perform?
KI: Of course, I have some standard plan based on which I am building my kernels. Moreover, I even have a template that usually helps me to build my notebooks in the fastest way. So I don’t need to spend a lot of time on rewriting code the same for all datasets like data reading, some visualizations, and basic analysis.
I can describe my plan in a couple of words but I think that it is really usual for general Data Science tasks.
- Understand the domain. I think this is the most important part of exploratory data analysis. If data is not anonymized I am trying to understand the origin and meaning of every column. To my mind, in about 90% of cases if you have enough knowledge about the data you can easily beat other teams that don’t use domain knowledge in modeling.
- Missing values analysis. Sometimes some columns have a really small number of samples and we can’t use them. So we need first to check how many missed rows we have.
- Data visualization. It is an important step that can clarify a dataset a lot. Fortunately today there are a lot of different visualization tools that can provide information about the dataset from all sides.
- Deep analysis. It is some abstract step that can include a lot of different steps depending on the competition or a dataset. Basically, I do here an analysis of relations between variables, feature selection, and data preparation.
- Baseline model. Usually, I am trying to finish my kernel with a simple model that helps to understand some baseline scores.
AV: Since you actively participate in the competitions as well, which is your favorite competition so far? What were the challenges that you encountered while coming up with its solution and how did you finally reach its solution?
KI: I really like to take part in kaggle competitions but due to the limitation of my free time I really seldom do it seriously and from the very beginning until the end of the competition. I don’t think I have the favorite one, each of my past competitions brings me some new knowledge and sometimes new connections with other data scientists.
The main challenge for me is that usually, people are fighting for hundredths, thousandths in the competition metric. And usually, the difference between a gold medal and no medal at all is really small. Every time I try to fight with it in different ways – starting on building powerful ensembles, ending finding some insides in datasets.
And usually, the difference between a gold medal and no medal at all is really small.
Kostiantyn’s Advice to the Beginners
AV: Which are your top five notebooks specifically related to EDA that you would like to recommend to beginners?
KI: Basically I don’t memorize notebooks when I check them. I just take new technology and search for information about it to be ready to use it in the future. But if we speak about notebooks with good EDA for beginners we of course need to take a look at Titanic and House Prices competitions. Just sorted kernels by “Most Votes” we will see a lot of notebooks with hundreds and even thousands of votes. Most of them are really impressive works and every beginner will obtain a lot of new knowledge there.
AV: How do you keep yourself updated with all the rapid advancements being made in the field of Machine learning and Deep Learning?
KI: For today there are a lot of opportunities to study, and it does not depend on your professional level. I can example just a couple of those ways that I use.
1) Research papers. This allows you to check some new models, methods, and other tools before they become popular and commonly used.
2) Kaggle notebooks and competitions. I think I don’t need to describe how much kaggle can bring to you. There are always new methods, models inside the real-world tasks on this platform.
3) Communications inside the team and different Data Science communities. Knowledge sharing inside the large groups of professionals is the truest way to increase your level.
AV: What would be your advice to beginners in the field of Data Science on how should they level up in this field?
KI: The first and the most important – never stop learning something new. Data Science is a very fast-growing field where updates happen often. Fortunately, we have a lot of opportunities today for study. You can complete online courses, read papers, take part in different competitions and simply have discussions with your colleagues and other experts in the field of Data Science.
The second one – don’t forget about your programming skills. Good Data Scientist should be a good programmer too.
Also, we need to focus a little bit on soft skills. It is impossible to be a good specialist without the skill to communicate with other people and team members.
And the last one – don’t spend all of your free time on work. Find a good hobby, be happy with your family, and just have relaxed. It is much easier to burn out at work than you can think.
His thoughts and words are enough to get anyone to begin and stay focused on their data science journey. I hope this edition of the Kaggle Grandmaster Series with Kostiantyn adds value to your data science journey.
This is the 21st interview in the Kaggle Grandmasters Series. You can read the previous few in the following links-
- Kaggle Grandmaster Series – Exclusive Interview with Kaggle Quadruple/4x Grandmaster Rohan Rao
- Kaggle Grandmaster Series – Exclusive Interview with Kaggle Datasets Grandmaster Ruchi Bhatia(#Rank 5)
What did you learn from this interview? Are there other data science leaders you would want us to interview for the Kaggle Grandmaster Series? Let me know in the comments section below!