Must have books for data scientists (or aspiring ones)

Kunal Jain 25 Mar, 2019 • 4 min read

I am back to one of my favourite topics – books! To double up the excitement, this time the list is for data scientists (or aspiring ones). Unlike the previous lists, these books are not for the light readers. These books are meant for people who enjoy programming and statistics – just the kind a data scientist should be.

books for data science, analytics

As can be expected, there are 2 languages which deliver the subject matter in these books (no points for guessing which ones. If you can’t, this article is not for you) – R & Python. If you are a data scientist (or aspiring to be one), you should consider these books as must have in your library. Due to for some strange reason – I personally prefer these books in hard copy and not in Kindle format – but that is a personal choice. I probably like my walk up to my book rack thinking which book would be the best to refer to, for the problem, I might be working upon.

Here is the list of books (first the ones on R and then on Python):

must read books

1. R Cookbook by Paul Teetor

This is simply the best book to start your journey with R. It contains tons of examples and practical advice on a wide range of topics like file input / output, data manipulations, merging and sorting to building a regression model. For a starter in R, this book becomes your best pal during the initial testing time.

While the book is aimed towards starters, it still remains a prominent feature of the library of any data scientist.


2. Machine Learning for Hackers by Drew Conway & John Myles White

I think this book actually has a wrong title. I dropped purchasing it twice before giving it a shot (which happened only because of a recommendation from a close friend). This book is meant for data scientists and not hackers. I don’t know why the title says so. A very practical manual for learning machine learning, it comes with good visuals and you can get a copy of codes in Python (original book is based on R).


3. R graphics cookbook by Winston Chang

You can’t be a good data scientist unless you master the graphics in R! There is no better way for visualization, but to learn ggplot2. Sadly, learning ggplot2 might seem like learning a completely new language in itself. This is where this “cookbook” comes to rescue. The recipes from Winston are short, sweet and to the point. Buy this and it is bound to end up as one of the most referred book in your library.


4. Programming Collective Intelligence by Toby Segaran (popularly referred as PCI)

If there is one book you want to choose, out of this selection (for learning machine learning) – it is this one. I haven’t met a data scientist yet who has read this book and does not recommend to keep it on your bookshelf. A lot of them have re-read this book multiple times. The book was written long before data science and machine learning acquired the cult status they have today – but the topics and chapters are entirely relevant even today! Some of the topics covered in the book are collaborative filtering techniques, search engine features, Bayesian filtering and Support vector machines. If you don’t have a copy of this book – order it as soon as you finish reading this article! The book uses Python to deliver machine learning in a fascinating manner.


5. Python for Data Analysis by Wes McKinney

Written by Wes McKinney, this book teaches you everything you need about Pandas. For the starters (not sure why you are still reading this article), pandas are Python’s way to handle data structures. Except for the title of the book (which I find misleading), I like everything else about this book. It contains ample codes and examples to leave you capable of performing any operation / transformation on a dataframe in Python (using pandas).

For the advanced users, if you already know pandas, you should look at this presentation from Wes on what are the shortcomings of pandas.


6. The Hundred-Page Machine Learning Book by Andriy Burkov

Having read a ton of books trying to teach machine learning from various angles and perspectives, I struggled to find one that could succinctly summarize difficult topics and equations. Until Andriy Burkov managed to do it in some 100-odd pages.

It is beautifully written, is easy to understand and has been endorsed by thought leaders like Peter Norvig. Need I say more? Beginner or established, every data scientist should get their hands on this book.


7. Agile data science by Russell Jurney

A recent addition by O’Reilly, this book looks like a must read for data scientists. The focus is on using “light” tools, which are easy to use and still get the work done. This is currently on my reading list and I’ll update more details once I have read it.


These are the 7 must have books, if you are serious about being a data scientist. There are a couple of additional Python books, which you can consider – Natural Language processing with Python by Steven Bird et al and Mining the social web by Matthew A. Russell. The reason I have not kept them in the list is because you can find a lot of the information in these books easily on the web.


If you would have noticed, all the books I have mentioned are from O’Reilly – I think it is a tribute to the fascinating collection of books they have provided! What do you think about the list? Any other recommendations you would want to add to this list? Have you read any of these books mentioned above? Do let me know through the comments below.

If you like what you just read & want to continue your analytics learning, subscribe to our emailsfollow us on twitter or like our facebook page.

Kunal Jain 25 Mar 2019

Kunal is a post graduate from IIT Bombay in Aerospace Engineering. He has spent more than 10 years in field of Data Science. His work experience ranges from mature markets like UK to a developing market like India. During this period he has lead teams of various sizes and has worked on various tools like SAS, SPSS, Qlikview, R, Python and Matlab.

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers


Kumar Abhijeet
Kumar Abhijeet 05 Jun, 2014

Edit : Machine Learning for Hackers has codes in R not Python

Kumar Abhijeet
Kumar Abhijeet 05 Jun, 2014

cool...thanks for the link...added to my reading list :)

Roberto Bello
Roberto Bello 05 Jun, 2014

Free ebook KB – Neural Data Mining with Python sources

Kumar Doraiswami
Kumar Doraiswami 06 Jun, 2014

How about books on Data Quality? How about ensuring data quality in data streams ? An authoritative book is "Exploratory Data Mining and Data Cleaning" by Tamraparni Dasu + Ted Johnson. John Wiley

Jewon Seo
Jewon Seo 06 Jun, 2014

Thank for the link

Navdeep Kumar
Navdeep Kumar 06 Jun, 2014

Hey, thanks for sharing these books :-)

Shanil 11 Jun, 2014

I believe the book " Data Smart" by John W Foreman is good starting point to understand the statistical concepts and the method in an easy to understand and non technical way. It is a fun read and all of the case studies are explained using excel. Once the concepts are clear then it is a matter of learning the nuances of a programming language to apply the concepts in R or Python.

Igor Bobriakov
Igor Bobriakov 12 Jun, 2014

Great selection of books indeed! I would add another useful book on this topic - "Data Science for Business" -

jerry muse
jerry muse 12 Jun, 2014

Erm how can you seriously recommend a book you haven't read? Personally I found "Agile Data Science" to be one of the most shocking bandwagon style books I've ever seen. It's more of an article than a book. Really works out to be, excluding code and screenshots, about 20p per word.

Anshul 17 Jun, 2014

Hi Kunal, Can you please help with yny book/ebook with case studies in which Text Analysis and Neural Networks is applied. Thanks, Anshul.

tvmanikandan 17 Jun, 2014

Could you please share some materials /links to understand neural networks? -Mani

Michael Manoochehri
Michael Manoochehri 22 Jun, 2014

The entire Addison-Wesley Data & Analytics Series has excellent books for working with data (I'm the author of the introduction to the series) - check out "R for Everyone" and "Apache Hadoop YARN"

Dinesh 04 Jul, 2014

Hi Kunal, I am dinesh. I have around 7 Years of exp in web development. Now i want to change because the challenge and excitement is not same as previous. I am looking to change my path and i found that big data is very good field. I search from many place but still not found the how do i start and where do i start. Please provide me you suggestion that will help me. Please suggest me which area i choose in big data. Waiting for update. Thanks & Regards, Dinesh K...

kanish roshan
kanish roshan 13 Jul, 2014

hey kunal,its really good you helping out analytic guys..!!!!!!!!!! i am a university student and getting a training in data analytics can you suggest me some rules for learning basics of this industry and some e books regarding SAS,EXCEL,SQL..!!!also kindly suggest me must have skillset and roadmap to analytics industry...!!! greetings

Saurabh Jha
Saurabh Jha 29 Jul, 2014

Hi Kunal, It was nice reading your articles. I have over 7 years of experience in Data Warehousing. I have worked on end to end BI (ETL, Reporting, OLAP). LAst 4 years i have been focused on logical data modeling for Enterprise Data Warehouse Design & Architecture across verticals. I am seriously thinking to get into Data Mining, all these days i have gone through lots of articles on analytics. I am not a programmer and i do not know much of programming which is my worry when i think of switching over Analytics. I have good understanding of business and data and i believe i will need to refresh my skills in Statistics. I need your guidance as to how do i start my learning process i mean shall i start with python,R or first Statistics. I get slightly confused on my learning path. If possible please guide me. Regards Saurabh Jha

Basudev 01 Oct, 2014

Hi, I have got more than five years of industry experience im analytics. Currently I am working as a lead data scientists in an mnc. But I don't have any academic study In Analytics. Any one can suggest some certified course, somewhere around one lakh is my budget and looking for online options.

Anon 28 Dec, 2014

Rather late to this post, but I thought I'd add links to a couple of O'Reilly books, which I came across just a couple of days ago, and which are also available as free PDFs under Creative Commons. Think Stats: Probability and Statistics for Programmers Think Bayes: Bayesian Statistics Made Simple Although I haven't read the books yet, I presume that they can't be all that bad. :) I would nevertheless like to know what others who may have read them think how good (or bad) they are.

Rooz 03 Jan, 2015

Thanks for providing the list and sharing your reviews also. I enjoyed Rachel and Cathy’s book, it’s readable, informative, and like no other book I’ve read on the topic of statistics or data science.I got a lot out of Doing Data Science, finding the chapter organization on business problem specification, analytics formulation, data access/wrangling, and computer code to be very helpful in understanding DS solutions.

Isura Nirmal
Isura Nirmal 13 Jul, 2015

I found your site in the Facebook group for "Introduction to Data Science by Bill Howe" course. I was first amazed to see that everything I was looking for are in same place. Very informative and simple. I am a regular visitor of your site and good luck for this great work of yours.

Sarbarup Banerjee
Sarbarup Banerjee 13 Jul, 2015

i found this book very helpful:Wiley - Data.Mining.Techniques.for.Marketing.Sales.and.Customer.Support.(2004),.2Ed

Deepak 11 Oct, 2015

Hi Kunal, I have been working on Ms Excel for 6 years. I have done my B.Sc. (Math Hon.). Now hv started learning SAS. I want to become a Data Scientist. Request you to suggest me what else do I need to do to become a data scientist. Thanks.

Ashish Arora
Ashish Arora 30 Sep, 2016

Can anyone suggest me any book related to SAS for data analysis ?

Related Courses