Kaggle Grandmaster Series – Notebooks Grandmaster Mobassir Hossen’s Journey from Software Engineer to Data Science
“Being dynamic is important, especially in the Data Science field.” – Mobassir Hossen
To achieve your goal of transitioning into data science or becoming a Grandmaster, a lot of dedication and self-learning is required along with the ability to be a dynamic learner. No matter what your background is.
Not sure where to start?
Well, how about hearing from another Kaggle Grandmaster? That’s right – we are proud to present this third installment in our Kaggle Grandmaster Series with Notebooks Grandmaster Mobassir Hossen!
Mobassir is a Kaggle Notebooks Grandmaster with a Kaggle rank of #44. He is a Kaggle Discussions Master and Kaggle Competitions Expert as well. Also, he graduated with a Software Engineering Degree from Daffodil International University-DIU and currently works as a Data Scientist at Markopolo.ai.
A journey from software engineering to data science? That’s one a lot of people would love to know more about!
In this interview, we cover a range of topics, including:
- Mobassir Hossen’s Transition from Software Engineer to Data Science
- Mobassir’s NLP Journey and his advice to NLP enthusiasts
- Mobassir’s Kaggle Journey from Scratch to becoming a Grandmaster
So, go through this interview and absorb all you can!
Mobassir Hossen‘s Transition from Software Engineering to Data Science
Analytics Vidhya (AV): You did your Bachelor’s in Software Engineering, then how did you make your transition from SWE to Data Science?
Mobassir Hossen (MH): There are different kinds of fields open for a software engineer and I felt a fascination for a lot of them. So I invested a lot of time working on software security, the Internet of Things, Embedded System design, etc. I was jumping from one ship to another like that and was unable to pick a fixed field for my career.
The main problem was “I felt love for all of those departments and it was hard for me to pick a single one from them for my career”. While reading papers for “Internet of things (IoT)” one day, I found an idea about a project and I thought it would be cool if I can implement that. The idea was to design a system that can detect carbon monoxide (CO) percentage in a room because I had found in some papers that if carbon monoxide percentage crosses a certain limit threshold then it can kill people staying in that room.
So I began searching for a solution that I can try to tackle this problem. Then I heard somewhere people talking about an algorithm called SVM(support vector machine) that can be used to classify CO after taking data from sensors using Arduino. My next search on google was “what is SVM?”.Google told me that it’s a Machine learning algorithm. That is when I came to know about machine learning. I was in my 3rd year’s 1st semester at that time. I started taking machine learning courses to understand algorithms like SVM to solve that IoT problem I had in my head, and somehow I felt extremely addicted to machine learning and started investing too much time learning stuff related to machine learning. This is how I dropped my IoT project and picked Data Science as my career.
AV: Since the Software Engineers already know to program, what are the additional things they should focus on in order to do this transition?
MH: Software Engineers know programming required for Software 1.0 whereas data science demands programming skills for software 2.0. In software engineering, we had statistics and mathematics and that helped in my transition. Other than this, I think thee following points also play a role in your transition from SWE toDS:
- Adaptability: Data Science demands different coding skills and analytical skills compared to software engineering and to achieve all these we need to “practice” these new skills a lot as we did during our undergraduate for solving software 1.0 related problems.
- Dynamic learner: I think most of the good software engineers are having this skill already and this skill helps in the data science field too,.you will have to do a lot of google search, read SOTA papers and keep up to date with recent works in the data science field, a lot of people don’t do this but I think it is a very important skill to have.
I have seen a lot of top-notch software engineers that don’t want to learn data science because this field is changing day by day and that’s why I think “we need to be a dynamic learner”.I mean this DS field is changing day by day, we need to get updated with the latest research and a lot of software engineers don’t want to do this hence they don’t decide to pick DS.
Mobassir‘s Interest and Experience in Healthcare
AV: I noticed you’re interested in healthcare startups. What specifically do you look for when you look for machine learning use cases in healthcare startups?
MH: Here are some of the points I especially look for when I look for machine learning use cases in healthcare startups:
- Early detection and prediction of a particular disease with the aim of “saving lives”
- A lot of diseases like pneumothorax detection is really very challenging and even radiologist with years of experience can make mistakes so I plan to design a tool for assisting radiologist/doctors by providing smart data-driven solutions
- I look for a way to assist doctors and reducing “wrong treatment” or false negative, false positive score so that we can save lives, a lot of people die daily because of wrong treatment(like marking a patient safe at an earlier stage of his/her disease but later it costs a life because the doctor made mistake as we all human do)
AV: Can you suggest any good datasets or competitions where people interested in healthcare can participate?
MH: It depends. If someone is willing to solve tabular data problems then:
- Breast Cancer Wisconsin (Diagnostic) Data Set
- Heart Disease UCI
- Mechanisms of Action (MoA) Prediction
- Cervical Cancer Risk Classification
- University of Liverpool – Ion Switching Prediction, etc
If someone is willing to solve Radiological/computer vision-related problems then:
For assisting dermatologist one can start with “SIIM-ISIC Melanoma Classification”
Actually, it depends on the individual’s interest, there are lots of medical data problems. You need to ask yourself “which medical problem you want to solve the most?” and you can start from there. As I said it’s a “dynamic process”. You can start with a problem and realize “well I don’t know much about this problem and also don’t know how to solve this problem through data-driven approaches, but now I am interested”.you can start from having zero knowledge and still end up being a pro. You have almost all the resources required. All you need to do is spend a lot of time googling, reading papers, notebooks, books, etc.
Mobassir‘s Kaggle Journey to Becoming a Grandmaster
AV: You’re the first Kaggle Notebooks Grandmaster from Bangladesh, and this definitely would feel great. What were the challenges you faced in this journey?
MH: I still remember how it all started. It took me 77 days to finish this Coursera course on Machine Learning by Andrew Ng when I was a 3rd-year undergraduate student. I became so addicted to Machine Learning that I sacrificed university quizzes, presentations, exams, etc even though I had a high CGPA till then.
Why did I do that? They call it “passion” these days 🙂
When I first started my ML journey, at that time in my university every CS student was busy solving competitive programming problems but I found machine learning very interesting. So I wanted to learn ML but I saw no one around me has even basic ML knowledge. Hence, no one around me could guide me well for machine learning.
During my initial days into ML, the answers I received for my crucial career-related questions were very demotivating.
My question: “I am interested in machine learning. I want to become a Data Scientist. Is it the wrong idea/decision?”
Answer 1: “Mobassir, machine learning is the hottest topic now but what happens if after 10 years some other technology replaces machine learning? What will you do? So do code forces competition”
Answer 2: “in Bangladesh very, very fewer companies work on machine learning problems. You are less likely to get a job with this skill here so learn web/android framework and regularly solve competitive problems only”
Answer 3: “if you don’t have heavy math/statistics knowledge then don’t go for machine learning”
These replies really worried me and led to a lot of self-doubts. But anyway, I signed up in kaggle 2 years ago and became part of a community so diverse and collaborative that there was no looking back from there. Today I am very proud that I rejected all surrounding people’s guidance and followed my own, which was toward “my passion”.
AV: I’m sure you must be participating in the discussions and competitions but how did you end up entering the Notebook aspect of Kaggle and even got the Grandmaster title in that?
MH: When I started my data science journey, I was already having a very busy academic schedule. Consequently, I couldn’t spend much time on Kaggle. I started by “participating in the discussions” and since I was from the SWE background and this helped me learn quickly. I collaborated with people in the discussions forum and later some of them became my good buddies with whom I still compete in Kaggle.
As I said this field demands people with a “dynamic learning attitude”. I have no special talent but I realized that I have a “dynamic learning attitude” and that’s why after so many fluctuations I decided to build my career in the DS field and this leads me to invest a lot of time in the kernels/notebooks section and the GrandMaster title from Kaggle followed. I can assure you that at least 70% of notebooks that I wrote “I started with ZERO knowledge, did a lot of google search and read other’s solution, discussions and by the time I wrote the last lines of those kernels, I knew something” and that is why I think “being dynamic is important, especially for DS field”.
AV: Do you follow certain steps while creating the notebooks? Can you share them?
MH: Yeah, I have learned a few techniques from vastly experienced Kagglers in the past and I try to apply those most of the time. They are as follows:
- Don’t print too much log information, if a particular code cell is printing too much log information then there is a possibility that a lot of people won’t read your notebook fully, they just dislike “scrolling for forever”
- People come to the DS field from different backgrounds and not everyone has a great coding background so I try to explain what a particular code block is doing with visual graphs or words so that everyone can understand. I see a lot of people will write beautiful code but won’t describe what his/her codes are doing. they just want to show the world that “they can write beautiful codes”
- While creating a notebook I check almost all the related notebooks and try to find a gap. If I find a gap or find something that “no one tried yet” then I simply try to implement and bring that in my notebook, it is more like (research and development process)
- I always give a reference for contents and codes that I take from elsewhere so there is always a reference section for my works but I see a lot of people don’t do this, sometimes they will simply change variable names, function names, etc and pretend like It is his/her work which is a very bad practice, I keep this in mind while creating notebooks
- In data science problems “lack of domain knowledge” is a big issue and each and every data problem asks for different domain knowledge. So in my notebooks sometimes I try to share domain knowledge of particular data problem and I learn them by “googling” and sharing them means saving a lot of times for others ☺
- I try to write clean code but sometimes I get messed up
AV: If somebody is starting from scratch and wants to create industry-ready notebooks, can you share five points they should keep in mind?
MH: Clear documentation of each segment in markup and Comments describing why the function is needed rather than what it does
- Write reusable components/ avoid code duplication and always use descriptive variable names
- Not making the notebook super long/ best to create a local library of the reusable components and call them in the notebook. Transform and save function/routine, classes into .py file and call them from local module
- In order to be able to use notebooks not only for rapid prototyping but also for long-term productivity, certain process events must be logged so that, for example, errors can be diagnosed more easily and the entire process can be monitored.
- Practice Proper Unit testing
Wow! What an inspiring interview that was. Such wise words can only come after a lot of experience.
Mobassir’s journey is a testament to the fact that -one never knows how many doors are open by simply listening to yourself. I hope this interview will help you answer your DS career-related questions more precisely.
This is the third interview in the series of Kaggle Interviews. You can read the first 2 interviews here-
- Kaggle Grandmaster Series – Exclusive Interview with 2x Kaggle Grandmaster Firat Gonen
- Kaggle Grandmaster Series – Exclusive Interview with Kaggle Rank #8 and Competitions Grandmaster Ahmet Erdem
What did you learn from this interview? Are there other data science leaders you would want us to interview? Let me know in the comments section below!