Learn everything about Analytics

27 Amazing Data Science Books Every Data Scientist Should Read

Introduction

Every person has their own way of learning. What helped me break into data science was books. There is nothing like opening your mind to a world of knowledge condensed into a few hundred pages. There is a magic and allure to books that I have never found in any other medium of learning.

“If you only read the books that everyone else is reading, you can only think what everyone else is thinking.” – Haruki Murakami

18 New Must Read Books for Data Scientists on R and Python

Learning Data Science on your own can be a very daunting task! There are numerous ways to learn today – MOOCs, workshops, degrees, diplomas, articles, and so on. But putting them in a structure and focusing on a structured path to become a data scientist is of paramount importance.

But there are hundreds of books out there about data science. How do you choose where to start? Which books are ideal for learning a certain technique or domain? While there’s no one-shoe-fits-all answer to this, I have done my best to cut down the list to these 27 books we’ll see shortly.

I have divided the books into different domains to make things easier for you:

  • Books on Statistics
  • Books on Probability
  • Books on Machine Learning
  • Books on Deep Learning
  • Books on Natural Language Processing (NLP)
  • Books on Computer Vision
  • Books on Artificial Intelligence
  • Books on Tools/Languages
    • Python
    • R

 

Bonus:

At the bottom of the article, you will find a superbly illustrated infographic mentioning each book. You can use that as a ‘to-read’ shelf and strike them off as you go down the list! You can also download a High Resolution copy of this infographic. It’s perfect for printing as it’s in a PDF format.

Without any further ado, let’s dive right in.

 

Books on Statistics

Statistics in Plain English

Image result for statistics in plain englishAuthor: Timothy C. Urdan

I started my journey into the world of statistics with this beauty of a book. It’s written for absolute beginners and in a way that makes you come back for more. The writing style and explanations provided do justice to the title – Statistics in Plain English. You could recommend it to any non-technical person and they would get the hang of these topics, it’s that good!

 

Think Stats: Probability and Statistics for Programmers

Author: Allen B. Downey

You’ll find this book at the top of most data science book lists. The book comes with plenty of resources. Use the above link to go to the book home page and you’ll see resources like data files, codes, solutions, etc. It will be especially useful for folks who know the basics of Python. The language is used to demonstrate real world examples.

 

Introduction to Statistical Learning

Authors: Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani

An all-time classic. This book is recommended or referenced in most machine learning courses I’ve come across, it’s just that well written. It covers basic statistics as well as machine learning techniques. The awesome thing about this book is that each concept is explained with case studies in R. So once you have a handle on programming, you can always come back and try out each concept again. What better way to ingrain a concept than by practicing it multiple times?

 

Books on Probability

Probability: For the Enthusiastic Beginner

Author: David Morin

Ideal book for beginners. It is written for college students so all of you looking to learn probability from scratch will appreciate the way this is written. All the basics are covered – combinatorics, the rules of probability, Bayes’ theorem, expectation value, variance, probability density, common distributions, the law of large numbers, the central limit theorem, correlation, and regression.

 

Introduction to Probability

Authors: J. Laurie Snell and Charles Miller Grinstead

Another introductory book covering basic probability concepts. Like the book above, this one is a comprehensive text written with college graduate students in mind. Why do I keep repeating that, you might be wondering. It’s because I want to emphasize that if there’s a place to start learning from scratch, it’s a book that’s written for students who haven’t ever ventured into this field before.

 

An Introduction to Probability Theory and its Applications

Author: William Feller

As the book’s description states, it’s a complete guide to the theory and practical applications of probability theory. I recommend reading this if you really want to deep dive into the world of probability. It’s a VERY comprehensive text and might not be to a beginner’s taste. If you’re learning probability just to get into data science, you can get away with reading either of the two probability books mentioned above.

 

Books on Machine Learning

The Hundred-Page Machine Learning Book

Author: Andriy Burkov

I love this book. Having read a ton of books trying to teach machine learning from various angles and perspectives, I struggled to find one that could succinctly summarize difficult topics and equations. Until Andriy Burkov managed to do it in some 100-odd pages. It is beautifully written, is easy to understand and has been endorsed by thought leaders like Peter Norvig. Need I say more? Beginner or established, every data scientist should get their hands on this book.

 

Machine Learning

Author: Tom Mitchell

Before all the hype came about, Tom Mitchell’s book on machine learning was the go-to text to understand the math behind various techniques and algorithms. I would suggest brushing up on your math before taking this up. But you don’t need any background in AI or statistics to understand these concepts. It was the first-ever book I read on ML! It’s modestly priced so it’s definitely worth adding to your collection.

 

Elements of Statistical Learning

Image result for element of statistical learningAuthors: Trevor Hastie, Robert Tibshirani and Jerome Friedman

And we’re back with another classic by Hastie and Tibsharani! It’s the natural successor to the ‘Introduction to Statistical Learning’ book we covered earlier. While there are a few overlaps with that book, this one takes a more advanced look at what we call machine learning algorithms. Topics like neural networks, matrix factorization, spectral clustering are covered apart from the common ML techniques.

 

Books on Deep Learning

Deep Learning

Image result for deep learning by ian goodfellowAuthors: Ian Goodfellow, Yoshua Bengio and Aaron Courville

What a list of rockstar authors! The ‘Deep Learning’ book is widely regarded as the best resource for beginners. It’s divided into three sections: Applied Math and Machine Learning Basics, Modern Practical Deep Learning Frameworks, and Deep Learning Research. It is to-date the most cited book in the deep learning community. Keep it by your bedside, worship it and reference it often – this will be your companion whenever you start your deep learning journey.

 

Deep Learning with Python

Author: Francois Chollet

A really cool way of learning deep learning (or machine learning for that matter) is by programming side-by-side with the theory. And that’s the approach Francois Chollet follows in the ‘Deep Learning with Python’ book. Concepts are taught using the popular Keras library. Francois is the creator of Keras so who better to teach you this topic? I also recommend following Francois on Twitter – there is a lot we can learn from him.

 

Neural Networks and Deep Learning

Author: Michael Nielsen

This is a free online book to learn about the core component that powers deep learning – neural networks. I quite like the way this book has been written. It takes a practical approach to teaching and looks at deep learning topics from the lens of a beginner. You will not learn any programming language in this book – it’s a good old fashioned text book on the underlying insights behind neural networks.

 

Books on Natural Language Processing (NLP)

Natural Language Processing with Python

Authors: Steven Bird, Ewan Klein and Edward Loper

Another book in this collection which sticks to the learn by doing policy. You’ll pick up Python concepts you otherwise wouldn’t have and will navigate the world of NLP using the NLTK library (Natural Language Toolkit). While this shouldn’t be the only resource you refer to for learning NLP (it’s far too complex a field for that), it offers a pretty decent introduction to the topic.

 

Foundations of Statistical Natural Language Processing

Authors: Christopher Manning and Hinrich Schutze

Published almost two decades ago, this text still serves as an excellent introduction to natural languages processing. It’s a very comprehensive guide to the broader sub-topics in NLP, like Text Categorization, Parts-of-Speech Tagging, Probabilistic Parsing, among various other things. The authors have provided a rigorous coverage of mathematical and linguistic foundations. Again, the book is quite detailed so keep that in mind.

 

Speech and Language Processing

Authors: Daniel Jurafsky and James H. Martin

The emphasis of this book is on practical applications and scientific evaluation in the scope of natural language and speech. I included this book to expand our horizons beyond text – to look at speech recognition as well. And why not? It’s an area of research that is thriving nowadays with a plethora of applications coming out everyday. Jurafsky and Martin have written an in-depth book on NLP and computational linguistics. This one is from the masters themselves.

 

Books on Computer Vision

Computer Vision: Algorithms and Applications

Author: Richard Szeliski

Explore a variety of common computer vision techniques in this book, especially ones used for analyzing and interpreting images. While this was published almost 9 years ago, the examples and methodology illustrated by Richard Szeliski are applicable today as well. It’s a comprehensive text that takes a scientific approach to solving basic vision challenges. The website I have linked to above contains a free PDF copy of the book

 

Programming Computer Vision with Python

Author: Jan Erik Solem

Before you dive into this awesome book, go to the website I’ve linked above and download the datasets, the code notebooks and clone the GitHub repository mentioned there. They are excellent companions in this REALLY hands-on introduction to the world of computer vision. As the author states, “You’ll learn techniques for object recognition, 3D reconstruction, stereo imaging, augmented reality, and other computer vision applications as you follow clear examples written in Python.”

 

Computer Vision: Models, Learning, and Inference

Author: Dr. Simon J.D. Prince

The book starts off from scratch by introducing us to the concepts of probability and quickly picks up pace from there. While some of the frameworks introduced here have seen more advanced versions come out, this book is nonetheless relevant in the current context. More than 70 algorithms have been introduced and the text is beautifully complemented by over 350 illustrations. The website also contains PowerPoint slides, if that’s the kind of learning you prefer.

 

Books on Artificial Intelligence

Artificial Intelligence: A Modern Approach

Authors: Stuart Russell and Peter Norvig

A book written by Stuart Russell and Peter Norvig? I am sold. It is the leading book in Artificial Intelligence. More than 1300 universities in over 100 countries reference/cite this book in their curriculum. Given who the authors are, it isn’t surprising to see the book length – 1100 pages. Covering the length and breadth of AI components – speech recognition, autonomous vehicles, machine translation, and computer vision among other things, this can be considered the Bible of AI.

 

Artificial Intelligence for Humans

Artificial Intelligence for Humans, Volume 1: Fundamental Algorithms by [Heaton, Jeff]Author: Jeff Heaton

What are the foundational algorithms underneath artificial intelligence? This book packs a lot of technical know-how into just 222 pages. This is volume 1 of a series of books on the techniques behind AI (dimensionality, distance metrics, clustering, error calculation, hill climbing, Nelder Mead, and linear regression). There is an accompanying site as well which contains examples cited in the book + a GitHub repository containing the code.

 

The Master Algorithm

Author: Pedro Domingos

If you’re looking for a technical book on AI, this isn’t it. What it is, however, is a masterful text on how machine learning is remaking business, politics, science and war. It is a thoughtful and thought-provoking book on where AI is right now, and where it might end up taking the human race. Will we ever find a single algorithm (or ‘The Master Algorithm’) that is capable of driving all knowledge from data? Join Pedro Domingos in his quest to find out.

 

Books on Python

Fluent Python: Clear, Concise, and Effective Programming

Fluent Python: Clear, Concise, and Effective Programming by [Ramalho, Luciano]Author: Luciano Ramalho

There are way too many resources out there to learn Python but nothing teaches you programming like a good old-fashioned book. As you might expect from a coding book, it’s a hands-on guide to help you understand how Python works and how to write awesome and effective Python code. Luciano Ramalho also covers a few popular libraries you’ll find yourself regularly using in data science projects. With a length of 794 pages, this book is worth the spend.

 

Programming Python: Powerful Object-Oriented Programming

Author: Mark Lutz

Wait, another Python book?! If you thought the above book taught you everything you need to know about Python, think again. This is a vast programming language with a lot more left to cover. Once you’ve mastered the fundamentals from the above book by Luciano Ramalho, take a gander on this one by Mark Lutz. There are in-depth tutorials on a wide variety of topics: databases, networking, text processing, GUIs, etc. Tons and tons of examples are included. A must-read for programming geeks.

 

Mastering Python for Data Science

Mastering Python for Data Science by [Madhavan, Samir]Author: Samir Madhavan

The two books we have covered so far for learning Python looked at the language from a programming perspective. Now it’s time to learn it from the data science angle. Which data science libraries are commonly used and how? How can you create data visualizations and mine for patterns in Python? And how can you code advanced data science/machine learning techniques to build models? These questions and more are answered by Samir Madhavan in this excellent write-up.

 

Books on R

R for Data Science

Cover imageAuthors: Garrett Grolemund and Hadley Wickham

Anyone who has remotely heard of R programming will have brushed across Hadley Wickham’s work. His work in this language is unparalleled – I could go on and on about him. I couldn’t recommend this book highly enough. You’ll learn how to import different kinds of data into R, the different data structures, and how to transform, visualize and model your data. The perfect book to learn data science through coding in R.

 

R for Everyone

Author: Jared P. Lander

I learned R way before I even heard about Python. I have a special place for it in my heart and Jared Lander’s R for Everyone played a big part in that. I got this book through one of my acquaintances and was immediately taken by how well it was written. It claims to be for ‘everyone’ and lives up to it’s name. This is a great book if you’re from a non-technical and non-statistical background.

 

R Cookbook

Author: Paul Teetor

The R Cookbook is an excellent addition to your budding data science reading list. It contains more than 200 practical recipes to help you get started with analyzing and manipulating data in R. Each recipe looks at a different problem. It’s meant for beginners, intermediate users and advanced practitioners alike. Whether it’s learning new programming skills or brushing up your concepts, this cookbook is for everyone.

 

And as promised, here is the full infographic covering all the books we saw in this article:

You can also read this article on Analytics Vidhya's Android APP Get it on Google Play
This article is quite old and you might not get a prompt response from the author. We request you to post this comment on Analytics Vidhya's Discussion portal to get your queries resolved

8 Comments

  • Krishna says:

    Hi Pranav,

    Thanks for a good article. Could you also the share the sequence in which one has to read the above mentioned books for the data science journey?

    Thanks in advance.

    • Pranav Dar says:

      Hi Krishna,

      Appreciate you taking the time out to go through the list! The books should be read initially in the intended sequence. Start with statistics and probability (the absolute base of most things you’ll learn in data science).

      Once done, move on to machine learning. After that comes the fork in the path. You could study deep learning if that’s where you see yourself down the line. Otherwise I would recommend picking a domain (banking, finance, marketing, etc.), understanding what kind of problems are there in those fields, and then branching out to study certain topics. For example, NLP is a big thing in marketing to understand reviews. Computer Vision is big in surveillance applications, manufacturing products, etc.

      I recommend checking out the below two learning paths our team has put together. They are REALLY comprehensive and free:
      Machine Learning – https://trainings.analyticsvidhya.com/courses/course-v1:AnalyticsVidhya+LPDS2019+LPDS2019_T1/about

      Deep Learning – https://trainings.analyticsvidhya.com/courses/course-v1:AnalyticsVidhya+LP_DL_2019+2019_T1/about

  • PULAPA RAJUBABU says:

    Excellent guidance for serious aspirants.

  • Gunashree says:

    Thanks for sharing this list Pranav. Very helpful!

  • Mariem Kasmi says:

    Thanks a lot! But what about the book “Hands-On Machine Learning with Scikit-Learn and TensorFlow”? Any recommendations?

    • Pranav Dar says:

      Hi Mariem,

      That’s a good book if you’re starting out and need to practice hands-on learning. It won’t give you a deep dive into algorithms but from a programming perspective, it’s a decent starting point. The examples presented might not be compatible with the latest TensorFlow version so make sure you check that before purchasing.

  • Siddharth says:

    Thanks, looks like a systematic approach

%d bloggers like this:




Enroll Now
%d bloggers like this:




Enroll Now