Lifetime Lessons: 20 Things Every Data Scientist Must Know Today

Kunal Jain 19 Dec, 2015 • 2 min read

Introduction

I’ve spent close to a decade in data science & analytics now. Over this period, I have learnt new ways of working on data sets and creating interesting stories. However, before I could succeed, I failed numerous times. Success doesn’t come easy!

How did I succeed? The answer is simple. Every time I failed, I said to myself, ‘Let’s take one more step’. And I managed to travel a long distance. I learnt statistics, data mining, SAS, R, Python, Machine Learning on the way.

I confess that, in last 10 years, the methods of predictive modeling have become faster. Data is becoming larger than ever. We faced constraints when faced with Big Data. But, people came out with several big data technologies.

It’s overwhelming to see how the things have changed. But, there would still be many who are lagging to catch up with success in data science industry.

Hence, I decided to share 20 things which experience has taught me in the last 10 years. Hope you find them useful. The idea is to help people, who don’t have a mentor to provide them this advice all the time. So go ahead and read!

successful data scientist

Some Useful Resources

Learn Python

Learn Ensemble Modeling

Learn Boosting Algorithms

Learn Machine Learning Algorithms

Learn k- fold Cross Validation

Learn Feature Engineering

Resources on Neural Networks and Deep Learning

Master Structured Thinking Skill

 

If you like what you just read & want to continue your analytics learning, subscribe to our emails, follow us on twitter or like our facebook page.

Kunal Jain 19 Dec 2015

Kunal is a post graduate from IIT Bombay in Aerospace Engineering. He has spent more than 10 years in field of Data Science. His work experience ranges from mature markets like UK to a developing market like India. During this period he has lead teams of various sizes and has worked on various tools like SAS, SPSS, Qlikview, R, Python and Matlab.

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear

Arai
Arai 18 Nov, 2015

Kunal good day. First of all, thank you so much for your blogs on analytics VId. I think that your blogs help me understand better ML and Business analytics more..My name is Arailym. I am from Kazakhstan.I want build my career in business analytics. I have B.S. degree in Economics. I know statistics basics and try to learn as much more algoritms of ML. I see that is company where I want to apply is required deep learning math. I read blogs of linear algebra and probability, but I afraid that i couldn't remind All math. (especially discrete mathematics, integrals so on) AlsoI want to participate in university competition. They required present project . I don\t know what to get available data and which tool i should use. I afraid if i take data from Kaggle that the would think that i cheated/ I want to do solve problems like segmentation or fraud detection. Plese help me. Thank you in advance. I

Akash
Akash 18 Nov, 2015

Hi Kunal,Thanks for sharing. I have a query on 5th one, Ensemble modeling. I was in the impression that different and appropriate algorithms are used to build models and improve accuracy. But here you say combine models and algorithms to boost accuracy. Can you elaborate on that by citing examples?Thanks,

Moumita Mitra
Moumita Mitra 18 Nov, 2015

Sir ...I regularly follow your post in linkedin. It is very helpful.. After completing my MCA in 2007 I had only 1 year java experience..After that i got married .Due to some circumstances I couldnt continue my job..But now I want to reboost my career in Bigdata industry...Is it possible after taking such a long gap to reenter in this industry?I am already doing small courses from coursera as you suggested.I am eagerly waiting for your valuable suggestion.

Ramdas Narayanan
Ramdas Narayanan 18 Nov, 2015

Hi Kunal, Excellent article, thanks for sharing the key steps/lessons for folks like me who are interested in Data Science/Analytics.

Ankur
Ankur 19 Nov, 2015

Excellent !

Ram Marthi
Ram Marthi 19 Nov, 2015

Crisp, insightful and to the point. Thank you very much for sharing your experiences with us.

Venkat
Venkat 19 Nov, 2015

Awesome! The areas of focus are summarized really well.

Satwik Mittal
Satwik Mittal 19 Nov, 2015

I beg to differ on one point though ! Python is nowhere near R in statistical modeling and ease of use ,,,,, People say R has a steep learning curve , and its true ! Python is easy , again true but as far as cleanliness of the R environment for statistics and a gigantic library of R packages plus the incredible syntax highlighter of R studio I don't really see myself using python . As far code execution speed is concerned you can always use the parallel library in R or write vectorised code or use H2o library or use the the enhanced R distribution from Revolution R open . Python is king of flexibility and R is the undisputed king of statistics

Venkata Sreedhar Nalam
Venkata Sreedhar Nalam 20 Nov, 2015

Excellent!!!! Great information....thank you for sharing your valuable insights!!! it helps..

Mustafa
Mustafa 22 Nov, 2015

Thanks for sharing your experience. Really it's an excellent article. ☺

Pushparaj
Pushparaj 27 Nov, 2015

Kunal ji, Excellent article

SHASHANK
SHASHANK 02 Dec, 2015

am using SVM function directly on 6 lakh rows and its getting hanged. Should I code SVM line by line and then try or is there some other method ??? can R studio handle huge data. without hadoop ??.

Vishanta
Vishanta 07 Dec, 2015

Good info Kunal, thnx.

Himanshu
Himanshu 01 Oct, 2016

One of the best article I have read. I am glad I read it. A big to-do list is in the pipeline now. Thank you so much!

Kumar Chinnakali
Kumar Chinnakali 23 Jul, 2017

Awesome AV Team, Free, but its high dollar worth !

  • [tta_listen_btn class="listen"]