Lifetime Lessons: 20 Things Every Data Scientist Must Know Today
Introduction
I’ve spent close to a decade in data science & analytics now. Over this period, I have learnt new ways of working on data sets and creating interesting stories. However, before I could succeed, I failed numerous times. Success doesn’t come easy!
How did I succeed? The answer is simple. Every time I failed, I said to myself, ‘Let’s take one more step’. And I managed to travel a long distance. I learnt statistics, data mining, SAS, R, Python, Machine Learning on the way.
I confess that, in last 10 years, the methods of predictive modeling have become faster. Data is becoming larger than ever. We faced constraints when faced with Big Data. But, people came out with several big data technologies.
It’s overwhelming to see how the things have changed. But, there would still be many who are lagging to catch up with success in data science industry.
Hence, I decided to share 20 things which experience has taught me in the last 10 years. Hope you find them useful. The idea is to help people, who don’t have a mentor to provide them this advice all the time. So go ahead and read!
Some Useful Resources
Learn Machine Learning Algorithms
Learn k- fold Cross Validation
Resources on Neural Networks and Deep Learning
Master Structured Thinking Skill
15 thoughts on "Lifetime Lessons: 20 Things Every Data Scientist Must Know Today"
Arai says: November 18, 2015 at 8:52 am
Kunal good day. First of all, thank you so much for your blogs on analytics VId. I think that your blogs help me understand better ML and Business analytics more..My name is Arailym. I am from Kazakhstan.I want build my career in business analytics. I have B.S. degree in Economics. I know statistics basics and try to learn as much more algoritms of ML. I see that is company where I want to apply is required deep learning math. I read blogs of linear algebra and probability, but I afraid that i couldn't remind All math. (especially discrete mathematics, integrals so on) AlsoI want to participate in university competition. They required present project . I don\t know what to get available data and which tool i should use. I afraid if i take data from Kaggle that the would think that i cheated/ I want to do solve problems like segmentation or fraud detection. Plese help me. Thank you in advance. IAkash says: November 18, 2015 at 11:11 am
Hi Kunal, Thanks for sharing. I have a query on 5th one, Ensemble modeling. I was in the impression that different and appropriate algorithms are used to build models and improve accuracy. But here you say combine models and algorithms to boost accuracy. Can you elaborate on that by citing examples? Thanks,Moumita Mitra says: November 18, 2015 at 2:43 pm
Sir ... I regularly follow your post in linkedin. It is very helpful.. After completing my MCA in 2007 I had only 1 year java experience..After that i got married .Due to some circumstances I couldnt continue my job..But now I want to reboost my career in Bigdata industry...Is it possible after taking such a long gap to reenter in this industry?I am already doing small courses from coursera as you suggested.I am eagerly waiting for your valuable suggestion.Ramdas Narayanan says: November 18, 2015 at 4:25 pm
Hi Kunal, Excellent article, thanks for sharing the key steps/lessons for folks like me who are interested in Data Science/Analytics.Ankur says: November 19, 2015 at 3:35 am
Excellent !Ram Marthi says: November 19, 2015 at 4:31 am
Crisp, insightful and to the point. Thank you very much for sharing your experiences with us.Venkat says: November 19, 2015 at 7:53 am
Awesome! The areas of focus are summarized really well.Satwik Mittal says: November 19, 2015 at 2:05 pm
I beg to differ on one point though ! Python is nowhere near R in statistical modeling and ease of use ,,,,, People say R has a steep learning curve , and its true ! Python is easy , again true but as far as cleanliness of the R environment for statistics and a gigantic library of R packages plus the incredible syntax highlighter of R studio I don't really see myself using python . As far code execution speed is concerned you can always use the parallel library in R or write vectorised code or use H2o library or use the the enhanced R distribution from Revolution R open . Python is king of flexibility and R is the undisputed king of statisticsVenkata Sreedhar Nalam says: November 20, 2015 at 11:30 am
Excellent!!!! Great information....thank you for sharing your valuable insights!!! it helps..Mustafa says: November 22, 2015 at 7:38 pm
Thanks for sharing your experience. Really it's an excellent article. ☺Pushparaj says: November 27, 2015 at 10:39 am
Kunal ji, Excellent articlePushparaj says: November 27, 2015 at 10:45 am
I have also completed my MCA in 2007, But my UG is B.Com. I am struggling to do something in Business Analytics.SHASHANK says: December 02, 2015 at 6:04 pm
am using SVM function directly on 6 lakh rows and its getting hanged. Should I code SVM line by line and then try or is there some other method ??? can R studio handle huge data. without hadoop ??.Vishanta says: December 07, 2015 at 11:55 am
Good info Kunal, thnx.Himanshu says: October 01, 2016 at 1:31 pm
One of the best article I have read. I am glad I read it. A big to-do list is in the pipeline now. Thank you so much!