Kunal Jain — Published On June 4, 2014 and Last Modified On March 25th, 2019
Beginner Big data Books Business Analytics Listicle

I am back to one of my favourite topics – books! To double up the excitement, this time the list is for data scientists (or aspiring ones). Unlike the previous lists, these books are not for the light readers. These books are meant for people who enjoy programming and statistics – just the kind a data scientist should be.

books for data science, analytics

As can be expected, there are 2 languages which deliver the subject matter in these books (no points for guessing which ones. If you can’t, this article is not for you) – R & Python. If you are a data scientist (or aspiring to be one), you should consider these books as must have in your library. Due to for some strange reason – I personally prefer these books in hard copy and not in Kindle format – but that is a personal choice. I probably like my walk up to my book rack thinking which book would be the best to refer to, for the problem, I might be working upon.

Here is the list of books (first the ones on R and then on Python):

must read books

1. R Cookbook by Paul Teetor

This is simply the best book to start your journey with R. It contains tons of examples and practical advice on a wide range of topics like file input / output, data manipulations, merging and sorting to building a regression model. For a starter in R, this book becomes your best pal during the initial testing time.

While the book is aimed towards starters, it still remains a prominent feature of the library of any data scientist.


2. Machine Learning for Hackers by Drew Conway & John Myles White

I think this book actually has a wrong title. I dropped purchasing it twice before giving it a shot (which happened only because of a recommendation from a close friend). This book is meant for data scientists and not hackers. I don’t know why the title says so. A very practical manual for learning machine learning, it comes with good visuals and you can get a copy of codes in Python (original book is based on R).


3. R graphics cookbook by Winston Chang

You can’t be a good data scientist unless you master the graphics in R! There is no better way for visualization, but to learn ggplot2. Sadly, learning ggplot2 might seem like learning a completely new language in itself. This is where this “cookbook” comes to rescue. The recipes from Winston are short, sweet and to the point. Buy this and it is bound to end up as one of the most referred book in your library.


4. Programming Collective Intelligence by Toby Segaran (popularly referred as PCI)

If there is one book you want to choose, out of this selection (for learning machine learning) – it is this one. I haven’t met a data scientist yet who has read this book and does not recommend to keep it on your bookshelf. A lot of them have re-read this book multiple times. The book was written long before data science and machine learning acquired the cult status they have today – but the topics and chapters are entirely relevant even today! Some of the topics covered in the book are collaborative filtering techniques, search engine features, Bayesian filtering and Support vector machines. If you don’t have a copy of this book – order it as soon as you finish reading this article! The book uses Python to deliver machine learning in a fascinating manner.


5. Python for Data Analysis by Wes McKinney

Written by Wes McKinney, this book teaches you everything you need about Pandas. For the starters (not sure why you are still reading this article), pandas are Python’s way to handle data structures. Except for the title of the book (which I find misleading), I like everything else about this book. It contains ample codes and examples to leave you capable of performing any operation / transformation on a dataframe in Python (using pandas).

For the advanced users, if you already know pandas, you should look at this presentation from Wes on what are the shortcomings of pandas.


6. The Hundred-Page Machine Learning Book by Andriy Burkov

Having read a ton of books trying to teach machine learning from various angles and perspectives, I struggled to find one that could succinctly summarize difficult topics and equations. Until Andriy Burkov managed to do it in some 100-odd pages.

It is beautifully written, is easy to understand and has been endorsed by thought leaders like Peter Norvig. Need I say more? Beginner or established, every data scientist should get their hands on this book.


7. Agile data science by Russell Jurney

A recent addition by O’Reilly, this book looks like a must read for data scientists. The focus is on using “light” tools, which are easy to use and still get the work done. This is currently on my reading list and I’ll update more details once I have read it.


These are the 7 must have books, if you are serious about being a data scientist. There are a couple of additional Python books, which you can consider – Natural Language processing with Python by Steven Bird et al and Mining the social web by Matthew A. Russell. The reason I have not kept them in the list is because you can find a lot of the information in these books easily on the web.


If you would have noticed, all the books I have mentioned are from O’Reilly – I think it is a tribute to the fascinating collection of books they have provided! What do you think about the list? Any other recommendations you would want to add to this list? Have you read any of these books mentioned above? Do let me know through the comments below.

If you like what you just read & want to continue your analytics learning, subscribe to our emailsfollow us on twitter or like our facebook page.

About the Author

Kunal Jain
Kunal Jain

Kunal is a post graduate from IIT Bombay in Aerospace Engineering. He has spent more than 10 years in field of Data Science. His work experience ranges from mature markets like UK to a developing market like India. During this period he has lead teams of various sizes and has worked on various tools like SAS, SPSS, Qlikview, R, Python and Matlab.

Our Top Authors

Download Analytics Vidhya App for the Latest blog/Article

32 thoughts on "Must have books for data scientists (or aspiring ones)"

Kumar Abhijeet
Kumar Abhijeet says: June 05, 2014 at 6:07 am
Edit : Machine Learning for Hackers has codes in R not Python Reply
Kunal Jain
Kunal Jain says: June 05, 2014 at 6:25 am
Hi Kumar Abhijeet, Thanks for pointing it out. Was planning to give a link to Will it Python - a site which has converted the codes in book to Python. The codes are available in IPython notebooks as well in the link (now updated in the article). Regards, Kunal Reply
Kumar Abhijeet
Kumar Abhijeet says: June 05, 2014 at 6:31 am
cool...thanks for the link...added to my reading list :) Reply
Roberto Bello
Roberto Bello says: June 05, 2014 at 11:38 am
Free ebook KB – Neural Data Mining with Python sources http://www.freeopen.org/?p=85 Reply
Kumar Doraiswami
Kumar Doraiswami says: June 06, 2014 at 1:40 am
How about books on Data Quality? How about ensuring data quality in data streams ? An authoritative book is "Exploratory Data Mining and Data Cleaning" by Tamraparni Dasu + Ted Johnson. John Wiley Reply
Jewon Seo
Jewon Seo says: June 06, 2014 at 9:25 am
Thank for the link Reply
Navdeep Kumar
Navdeep Kumar says: June 06, 2014 at 10:39 am
Hey, thanks for sharing these books :-) Reply
Kunal Jain
Kunal Jain says: June 09, 2014 at 11:07 pm
Thanks Kumar on the suggestion. Will check out the book. Regards, Kunal Reply
Kunal Jain
Kunal Jain says: June 09, 2014 at 11:10 pm
Thanks Roberto. The topic sounds very interesting and something of my area of interest. Will check it out. Regards, Kunal Reply
Shanil says: June 11, 2014 at 12:57 pm
I believe the book " Data Smart" by John W Foreman is good starting point to understand the statistical concepts and the method in an easy to understand and non technical way. It is a fun read and all of the case studies are explained using excel. Once the concepts are clear then it is a matter of learning the nuances of a programming language to apply the concepts in R or Python. Reply
Igor Bobriakov
Igor Bobriakov says: June 12, 2014 at 3:14 am
Great selection of books indeed! I would add another useful book on this topic - "Data Science for Business" - http://shop.oreilly.com/product/0636920028918.do Reply
jerry muse
jerry muse says: June 12, 2014 at 6:02 am
Erm how can you seriously recommend a book you haven't read? Personally I found "Agile Data Science" to be one of the most shocking bandwagon style books I've ever seen. It's more of an article than a book. Really works out to be, excluding code and screenshots, about 20p per word. Reply
Kunal Jain
Kunal Jain says: June 15, 2014 at 7:38 pm
Jerry, Thanks for the feedback. I'll update the list once I have gone through the book myself. Regards, Kunal Reply
Kunal Jain
Kunal Jain says: June 15, 2014 at 8:13 pm
Thanks for the suggestion Igor! Regards, Kunal Reply
Kunal Jain
Kunal Jain says: June 15, 2014 at 8:15 pm
Thanks Shanil for the suggestion. Reply
Anshul says: June 17, 2014 at 6:02 am
Hi Kunal, Can you please help with yny book/ebook with case studies in which Text Analysis and Neural Networks is applied. Thanks, Anshul. Reply
tvmanikandan says: June 17, 2014 at 7:18 am
Could you please share some materials /links to understand neural networks? -Mani Reply
Michael Manoochehri
Michael Manoochehri says: June 22, 2014 at 10:39 pm
The entire Addison-Wesley Data & Analytics Series has excellent books for working with data (I'm the author of the introduction to the series) - check out "R for Everyone" and "Apache Hadoop YARN" Reply
Kunal Jain
Kunal Jain says: June 23, 2014 at 9:42 pm
Thanks Michael for the suggestion. WIll definitely check them out. Regards, Kunal Reply
Kunal Jain
Kunal Jain says: June 23, 2014 at 11:02 pm
Mani, There is a course on coursera run by Mr. Hinton, you can check if you can access the Archived records. Regards, Kunal Reply
Kunal Jain
Kunal Jain says: June 23, 2014 at 11:09 pm
Anshul, There are loads of material avaialble online. I would start and then look for specific queries. For example, search for text mining for sentiment analysis and you will get relevant resources Regards, Kunal Reply
Dinesh says: July 04, 2014 at 5:51 am
Hi Kunal, I am dinesh. I have around 7 Years of exp in web development. Now i want to change because the challenge and excitement is not same as previous. I am looking to change my path and i found that big data is very good field. I search from many place but still not found the how do i start and where do i start. Please provide me you suggestion that will help me. Please suggest me which area i choose in big data. Waiting for update. Thanks & Regards, Dinesh K... Reply
Kunal Jain
Kunal Jain says: July 09, 2014 at 6:05 pm
Dinesh, Big data is a very vast field and the area to pick should ideally depend on you interest. If you are still not sure, learning Hadoop can be the best place to start. If you interest is on Machine Learning, learning Mahout could be a good place to start. Regards, Kunal Reply
kanish roshan
kanish roshan says: July 13, 2014 at 6:47 pm
hey kunal,its really good you helping out analytic guys..!!!!!!!!!! i am a university student and getting a training in data analytics can you suggest me some rules for learning basics of this industry and some e books regarding SAS,EXCEL,SQL..!!!also kindly suggest me must have skillset and roadmap to analytics industry...!!! greetings Reply
Saurabh Jha
Saurabh Jha says: July 29, 2014 at 5:59 am
Hi Kunal, It was nice reading your articles. I have over 7 years of experience in Data Warehousing. I have worked on end to end BI (ETL, Reporting, OLAP). LAst 4 years i have been focused on logical data modeling for Enterprise Data Warehouse Design & Architecture across verticals. I am seriously thinking to get into Data Mining, all these days i have gone through lots of articles on analytics. I am not a programmer and i do not know much of programming which is my worry when i think of switching over Analytics. I have good understanding of business and data and i believe i will need to refresh my skills in Statistics. I need your guidance as to how do i start my learning process i mean shall i start with python,R or first Statistics. I get slightly confused on my learning path. If possible please guide me. Regards Saurabh Jha Reply
Basudev says: October 01, 2014 at 7:25 pm
Hi, I have got more than five years of industry experience im analytics. Currently I am working as a lead data scientists in an mnc. But I don't have any academic study In Analytics. Any one can suggest some certified course, somewhere around one lakh is my budget and looking for online options. Reply
Anon says: December 28, 2014 at 7:41 am
Rather late to this post, but I thought I'd add links to a couple of O'Reilly books, which I came across just a couple of days ago, and which are also available as free PDFs under Creative Commons. Think Stats: Probability and Statistics for Programmers Think Bayes: Bayesian Statistics Made Simple Although I haven't read the books yet, I presume that they can't be all that bad. :) I would nevertheless like to know what others who may have read them think how good (or bad) they are. Reply
Rooz says: January 03, 2015 at 11:22 am
Thanks for providing the list and sharing your reviews also. I enjoyed Rachel and Cathy’s book, it’s readable, informative, and like no other book I’ve read on the topic of statistics or data science.I got a lot out of Doing Data Science, finding the chapter organization on business problem specification, analytics formulation, data access/wrangling, and computer code to be very helpful in understanding DS solutions. https://intellipaat.in/ Reply
Isura Nirmal
Isura Nirmal says: July 13, 2015 at 1:24 am
I found your site in the Facebook group for "Introduction to Data Science by Bill Howe" course. I was first amazed to see that everything I was looking for are in same place. Very informative and simple. I am a regular visitor of your site and good luck for this great work of yours. Reply
Sarbarup Banerjee
Sarbarup Banerjee says: July 13, 2015 at 9:03 am
i found this book very helpful:Wiley - Data.Mining.Techniques.for.Marketing.Sales.and.Customer.Support.(2004),.2Ed Reply
Deepak says: October 11, 2015 at 6:50 am
Hi Kunal, I have been working on Ms Excel for 6 years. I have done my B.Sc. (Math Hon.). Now hv started learning SAS. I want to become a Data Scientist. Request you to suggest me what else do I need to do to become a data scientist. Thanks. Reply
Ashish Arora
Ashish Arora says: September 30, 2016 at 3:06 am
Can anyone suggest me any book related to SAS for data analysis ? Reply

Leave a Reply Your email address will not be published. Required fields are marked *