Learn everything about Analytics

Lifetime Lessons: 20 Things Every Data Scientist Must Know Today

, / 15


I’ve spent close to a decade in data science & analytics now. Over this period, I have learnt new ways of working on data sets and creating interesting stories. However, before I could succeed, I failed numerous times. Success doesn’t come easy!

How did I succeed? The answer is simple. Every time I failed, I said to myself, ‘Let’s take one more step’. And I managed to travel a long distance. I learnt statistics, data mining, SAS, R, Python, Machine Learning on the way.

I confess that, in last 10 years, the methods of predictive modeling have become faster. Data is becoming larger than ever. We faced constraints when faced with Big Data. But, people came out with several big data technologies.

It’s overwhelming to see how the things have changed. But, there would still be many who are lagging to catch up with success in data science industry.

Hence, I decided to share 20 things which experience has taught me in the last 10 years. Hope you find them useful. The idea is to help people, who don’t have a mentor to provide them this advice all the time. So go ahead and read!

successful data scientist

Some Useful Resources

Learn Python

Learn Ensemble Modeling

Learn Boosting Algorithms

Learn Machine Learning Algorithms

Learn k- fold Cross Validation

Learn Feature Engineering

Resources on Neural Networks and Deep Learning

Master Structured Thinking Skill


If you like what you just read & want to continue your analytics learning, subscribe to our emailsfollow us on twitter or like our facebook page.


  • Arai says:

    Kunal good day. First of all, thank you so much for your blogs on analytics VId. I think that your blogs help me understand better ML and Business analytics more..My name is Arailym. I am from Kazakhstan.I want build my career in business analytics. I have B.S. degree in Economics. I know statistics basics and try to learn as much more algoritms of ML. I see that is company where I want to apply is required deep learning math. I read blogs of linear algebra and probability, but I afraid that i couldn’t remind All math. (especially discrete mathematics, integrals so on)
    AlsoI want to participate in university competition. They required present project . I don\t know what to get available data and which tool i should use. I afraid if i take data from Kaggle that the would think that i cheated/ I want to do solve problems like segmentation or fraud detection. Plese help me. Thank you in advance. I

  • Akash says:

    Hi Kunal,

    Thanks for sharing. I have a query on 5th one, Ensemble modeling.
    I was in the impression that different and appropriate algorithms are used to build models and improve accuracy. But here you say combine models and algorithms to boost accuracy. Can you elaborate on that by citing examples?


  • Moumita Mitra says:

    Sir …

    I regularly follow your post in linkedin. It is very helpful..
    After completing my MCA in 2007 I had only 1 year java experience..After that i got married .Due to some circumstances I couldnt continue my job..But now I want to reboost my career in Bigdata industry…Is it possible after taking such a long gap to reenter in this industry?I am already doing small courses from coursera as you suggested.I am eagerly waiting for your valuable suggestion.

  • Ramdas Narayanan says:

    Hi Kunal,
    Excellent article, thanks for sharing the key steps/lessons for folks like me who are interested in Data Science/Analytics.

  • Ankur says:

    Excellent !

  • Ram Marthi says:

    Crisp, insightful and to the point. Thank you very much for sharing your experiences with us.

  • Venkat says:

    Awesome! The areas of focus are summarized really well.

  • Satwik Mittal says:

    I beg to differ on one point though ! Python is nowhere near R in statistical modeling and ease of use ,,,,,
    People say R has a steep learning curve , and its true ! Python is easy , again true but as far as cleanliness of the R environment for statistics and a gigantic library of R packages plus the incredible syntax highlighter of R studio I don’t really see myself using python . As far code execution speed is concerned you can always use the parallel library in R or write vectorised code or use H2o library or use the the enhanced R distribution from Revolution R open . Python is king of flexibility and R is the undisputed king of statistics

  • Venkata Sreedhar Nalam says:

    Excellent!!!! Great information….thank you for sharing your valuable insights!!! it helps..

  • Mustafa says:

    Thanks for sharing your experience. Really it’s an excellent article. ☺

  • Pushparaj says:

    Kunal ji,
    Excellent article

  • SHASHANK says:

    am using SVM function directly on 6 lakh rows and its getting hanged. Should I code SVM line by line and then try or is there some other method ???
    can R studio handle huge data. without hadoop ??.

  • Vishanta says:

    Good info Kunal, thnx.

  • Himanshu says:

    One of the best article I have read. I am glad I read it. A big to-do list is in the pipeline now. Thank you so much!

Leave A Reply

Your email address will not be published.

Join world’s fastest growing Analytics Community
Receive awesome tips, guides, infographics and become expert at: