Kaggle Grandmaster Series – Exclusive Interview with Kaggle Discussion Grandmaster Raju Kumar Mishra (#Rank 48)

avcontentteam 29 Jan, 2021 • 10 min read

“Let me tell you a secret, getting the first freelance data science project may require huge patience”- Raju Kumar Mishra

India is the 2nd largest market for freelancers. Even new fields like data science have a lot of Freelance players to help you with your analysis.

In the 17th edition of the Kaggle Grandmaster Series, we have one such freelance data scientist joining us- Raju Kumar Mishra.

kaggle grandmaster series - raju kumar mishra

Raju is a Kaggle Discussion Grandmaster and ranks 48th with 51 gold medals to his name. He is also a Kaggle Notebooks Master. He has been very active on Kaggle for the past 8 years.

Raju has a Master’s in Computational Science from IISc (Indian Institute of Science) Bengaluru. He has been a Freelance Data Scientist since 2013.

You can go through the previous Kaggle Grandmaster Series Interviews here.

 

In this interview, we cover a range of topics, including:

  • Raju’s Education and Work
  • Raju’s Book Recommendation
  • Raju’s Advice for Beginners in Data Science
  • Raju’s Advice to Aspiring Data Science Freelancers

So without any further ado, let’s begin.

Raju’s Education and Work

kaggle grandmaster series education and work

Analytics Vidhya(AV): One interesting thing I noticed, you did your bachelor’s in Mining and Mineral Energy. This is not so common field but what is more uncommon is after that you did a Master’s in Computational Science. We would like to know how did you shift from rocks(minerals and mining) to a box(Computer)?

Raju Kumar Mishra(RKM): In the process of clearing the IIT JEE exam, then studying Bachelor of Technology, I got an intense interest growing in me for applied mathematics and applying computers to solve mathematical problems. In mining, one has to go through many subjects that require good knowledge of applied mathematics.

Subject Rock mechanics is full of, mathematical formulas, which have been developed using numerical and statistical knowledge. I also applied neural networks to predict, if a given mine roof is safe for working or not. Mining engineering requires knowledge of electrical science because many machines use electricity to work. In summary, many subjects, which are taught in Mining Engineering, require mathematics and statistical knowledge.

Whenever someone starts working in the mining industry, they will be surrounded by a huge amount of data. Data related to the production of minerals and coal, data related to manpower, a huge amount of data generated by heavy earthmoving machinery. These data can be used to get insight for preventive maintenance, improve production quality, and many improvements related to a more profitable business. 

The Master’s in Computational Science course, provided by IISc (Indian Institute of Science) Bengaluru,  was suitable for me to pursue my interest. I applied for it, got selected.  This course provided excellent knowledge to solve mathematical problems using computers and supercomputers. A Problem solver has to write code to solve mathematical problems, which makes a person a better programmer.

I consider data science is the science of estimation. Using different algorithms, we estimate the value of some quantity and try to minimize error in estimation. We estimate the value, if full information required to know the value of a quantity, is not available. We also use estimation when calculation the exact value might be costly.  Minimization of error is using some or other type of optimization.  We have to estimate values in all engineering departments. We can find, some or other use of data science in every working area.

 

AV: Before freelancing in the field of Data Analysis and Big data, you had considerable experience as a Software Developer. What were the challenges you faced while transitioning into  Data Science and which tools and sources helped you in overcoming those challenges?

RKM: As I have mentioned, the Computational science course, provided me, excellent knowledge of applied mathematics and solving mathematical problems using computers. One has to write their own code to solve mathematical problems.  In the process, I got a good knowledge of programming, as code written for mathematical problems will be optimized for data-intensive and computation-intensive conditions.

In my master course, I studied subjects like numerical methods, numerical linear algebra, parallel computation, simulation courses, scientific data visualization, statistics, probability, neural networks, stochastic finance many more associated with numerical mathematics. I went through pattern recognition and reliabilities courses too. In fact, I am coming from a data science background. Basically, I did not switch to Data Science, but I started to work in my core field.

We performed parallel computation using MPI (Message Passing Interface)  and some bits of OpenMP. How to use distributed system for computation, was not new to me. Therefore, Big data area (In the context of problem-solving), was not strange for me.

 

AV: India has the 2nd largest freelance workforce after the US with over 15 million people working independently in various sectors such as IT and programming, finance, sales, and marketing, etc.  What is your experience as a freelancer in the data analytics field?

RKM: Teaching is my passion. Mainly, I provide training in the area of programming and data science. I get corporate training assignments through Linkedin. I also provide training for groups of individuals. In the process of corporate training, sometimes I get problem-solving assignments.

Working as a freelancer provides me a good amount of flexibility and proper time to read new concepts.

I started my career as a freelancer at the end of 2013. In the beginning, it was very difficult to get corporate training and any freelance assignment. I will tell the people, who are eager to provide their service as a freelance, just have patience, you will get assignments.

 

AV: You’re a Kaggle Discussion Grandmaster. Why did you specifically focus on Discussions and how has that helped you in your career?

I started my kaggle journey with all three areas, competition, notebook, and discussion. Nowadays, the dataset area has been included. Since I am more involved in teaching, taking part in the discussion, was inevitable. My maximum discussion was more around new concepts discussion, new and efficient tools, and study notes, which can be used by working data scientists and data science aspirants to grow their expertise.

I am also a notebook master. Generally, I write notebooks on some new packages and new tools. Writing notebooks makes one more clear about, how to explain concepts, improve the skill of story-telling. Good notebooks require organizing your notebook content in a logical order, hence the skill of organizing reports is improved. Informative notebooks help others to learn new concepts. I feel that concentrating on one area at a time, helps you to concentrate more and provide valuable work.

 

Raju’s Book Recommendations

kaggle ggrandmaster series book recommendation

AV: You’ve written books as well, so working as a freelancer how did you find time to write books? And how has writing books helped you in your career?

RKM: Generally, I used to write in the evening on daily basis. Whenever I was not having assignments, I wrote for the entire day. Basically, I write whenever I get time.

Writing a book requires the knowledge of the subject, expertise to explain the knowledge, skills of identifying problems and solving them for readers, so they can be benefited. Whenever I explain a concept, I try that anyone can understand. This is the requirement for my profession as a corporate trainer. Explaining concepts in books, certainly helped me to improve my explanation skills. Writing books also helps in organizing training material and providing proper charts and diagrams for a better understanding of participants.

 

Raju’s Advice to the Beginners

data science interview work

AV: For a working professional who is planning to write a book, what would be your advice to him/her, since writing a book is a time-consuming process and how should one manage that?

RKM: Writing books, requires knowledge of the topic, and the skill to explain it. It also requires, to identify and create problems related to different concepts, and solve them, so that reader can understand concepts easily. Explanation of concepts requires creativity. Creativity helps in generating new ideas to explain topics. Skills and tools are required to create, relevant charts and diagrams, which helps the reader understand the concept.

I cannot create many ideas while I am sitting in front of the computer and writing a book. Ideas come to me at different points time. In order to overcome this problem, I used to have a diary and a pen always with me. Many ideas for writing, I used to get while I am in some vehicle or walking in the market or in my classes where I am explaining some concept or in the problem discussion sessions.  Choosing how to explain in a better way and organizing chapters is a time-consuming process.

Whenever I get the idea about, how to explain a particular topic or some new problem or some idea to solve a problem, used to write it in the diary. Then up to the evening, I used to have some material to add to writing every day. If I do not write the ideas at the time, when it popped, then I might be forgetting some ideas. Writing ideas in dairy, whenever it popped up, helped me in utilizing time more efficiently, because every day, I get some material for my book.

Organizing all topics, you are going to include in a chapter, will be helpful to create a thought path. Having a clear thought path will be helpful in completing the book early.

Having a bird’s eye view every day, on a whole day work routine, can help you find slots, which can be utilized for generating new ideas, data, and problem creation and writing. 

 

AV: The R vs Python is a never-ending debate. What is your take on this? Can you state three points in the favour of each, why should one choose R or Python? Do you see any other new languages like Julia, for eg that can occupy this space?

RKM: I started learning R in my statistics class, while I was pursuing my Master of Technology course in IISc. I understood that R can be used, to analyze any sort of data using many statistical and mathematical concepts. No need to reinvent the wheels, because R consists of many statistical and mathematical concept implemented and tested by many users.

I learned Python when I was working as a software engineer. I use both, Python and R, in my analysis and to perform day to day programming tasks. Both python and R is having a huge helpful community. There are many discussions on R vs Python, available as blogs and videos. Both languages have pros and cons. I have provided some benefits of using R and Python, on basis of my work experience and my feelings.

Key Benefits of using R :

  1. In my opinion, the best part of R is having code implementation of many mathematical and statistical concepts.
  2. In data visualization, R packages have many functionalities to provide a strong base to apply your creativity and create clear and concise charts.
  3. For new learners of data science, R might be easy to start with. Because learners can apply mathematical and statistical concepts on data using R from the first day. Applying the concept of data science from the first day, encourage learner.

Benefits of Using Python for Data Science :

  1. Python packages for data science can work for a huge amount of data. Memory optimization can be implemented easily.
  2. Python is a general-purpose programming language. Learning one language Python helps you to create data science models,  integrate them to GUI written in python itself, creating web applications, and creating an application for any domain.
  3. Python is an easy language. Readable code can be created using Python. Readable code is easy to understand, it’s why it is easy to find bugs in Python and fix them.

One can use R when the dataset is small (Small in the context of RAM size used). Using R, a data scientist can apply different ready to use algorithms. For exploratory analysis, R is good, due to many data analysis and data visualization packages with advanced functionality.

Python becomes useful if dealing with a huge amount of data. It is easier to use Python when machine learning and deep learning models have to be integrated into GUI and webpages.

Julia is an impressive programming language for data science. I loved its quote that “easy to write code that’s nearly as fast as C”. Its syntax is easy to learn and help in creating clean and concise code. Whenever people switch from one programming language to another, there is a cost associated with it. Cost like learning the new programming language, adapting to a new programming environment, etc.

Code is written in Julia. is faster in execution and clean, concise as a scripting language. Whenever fastness will be more important than the cost of switching to Julia from another programming language, people will start switching to it. What I feel, incoming time data science applications are going to be more computationally intensive. Therefore, a potential chance, Julia will be used more in data science than in other programming languages.

 

Raju’s Advice to Aspiring Freelancers in Data Science

freelance

AV: Many working professionals have lost their jobs unfortunately or have shifted to remote work due to this pandemic. So people are looking to do freelance work a lot more than the pre-covid time. What tips would you suggest to all such people who want to start freelancing in the field of Data Science?

RKM: Let me tell you a secret, getting the first freelance project may require huge patience. A Freelancer has to be positive that, he will get the first assignment. After getting the first assignment, life might be easier. As a freelancer, always improve your skills, get more certifications and make your work portfolio reach.

Write blogs, working on Kaggle can help you in making your data science portfolio impressive and get connected to data scientists from all over the world.  In order to achieve freelancing works, let people know that you are willing to work as a freelancer. You can start with Linkedin and Twitter to spread the news, you are ready to work as a freelancer and waiting for projects. Following companies/websites can be helpful in getting freelance data science assignment-

  1. Upwork
  2. Toptal
  3. Peopleperhour
  4. Glassdoor

 

End Notes

The flexibility that freelancing provides is very attractive for data science beginners. But that luxury, comes the other efforts you have to put to gain popularity in the community and make freelancing favorable to you. We hope Raju’s words and wisdom help you understand these things better.

This is the 17th interview in the Kaggle Grandmasters Series. You can read the previous few in the following links-

What did you learn from this interview? Are there other data science leaders you would want us to interview for the Kaggle Grandmaster Series? Let me know in the comments section below!

avcontentteam 29 Jan 2021

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear