Free Must Read Books on Statistics & Mathematics for Data Science
Introduction
The selection process of data scientists at Google gives higher priority to candidates with strong background in statistics and mathematics. Not just Google, other top companies (Amazon, Airbnb, Uber etc) in the world also prefer candidates with strong fundamentals rather than mere know-how in data science.
If you too aspire to work for such top companies in future, it is essential for you to develop a mathematical understanding of data science. Data science is simply the evolved version of statistics and mathematics, combined with programming and business logic. I’ve met many data scientists who struggle to explain predictive models statistically.
More than just deriving accuracy, understanding & interpreting every metric, calculation behind that accuracy is important. Remember, every single ‘variable’ has a story to tell. So, if not anything else, try to become a great story explorer!
In this article, I’ve compiled a list of must read books on statistics and mathematics. I understand, mathematics has no extreme. Hence, I’ve enlist only those books which will help you to connect with data science better.
Note: Books which are made free to access by the registered authorities have been mentioned in this article. If not, a link to amazon bookstore is provided.
Statistics
Introduction to Statistical Learning
This is a highly recommended book for practicing data scientists. The focus of this books is kept on connecting statistics concept with machine learning. Hence, you’ll learn about all popular supervised and unsupervised machine learning algorithms. R users will get an advantage, since the practical aspects of algorithms have been demonstrated using R. In addition to theory, this book also lay emphasis on using ML algorithms in real life setting.
Available: Free Download
Elements of Statistical Learning
This book is an advanced level of previous book. It is written by Trevor Hastie and Rob Tibshirani, Professors at Stanford University. Their first book ‘Introduction to Statistical Learning’ uncover the basics of statistics and machine learning. This book, will introduce you to higher level algorithms such as Neural Networks, Bagging & Boosting, Kernel methods etc. The algorithms have been implemented in R programming.
Available: Free Download
Think Stats
The author of this book is Alien B Downey. It is based on perform statistical analysis practically in Python. Hence, make sure you’ve got some basic knowledge of Python before buying this book. It focuses entirely on understanding real life influence of statistics using popular case studies. Since, stats and math are closely connected, it also has dedicated chapters on topic like bayesian estimation.
Available: Buy from Amazon
From Algorithms to Z Scores
Did you know the about crucial role of statistics in programming ? The author of this book is Norm Matloff, Professor, University of California. This book explains using probabilistic concepts and statistical measures in R. Again, a good practice source for R users. It teaches the art of dealing with probabilistic models and choosing the best one for final evaluation. It is a highly recommended book (specially for R users).
Available: Free Download
Introduction to Bayesian Statistics
This is a highly recommended book for freshers in data science. The author of this book is William M Bolstad. It’s a must read for people who find mathematics boring. Having been written in a conversational style (rare to find math this way), this book is a great introductory resource on statistics. It begins with scientific methods of data gathering and end up delivering dedicated chapters on bayesian statistics.
Available: Free Download
Discovering Statistics using R
This book is written by Andy Field, Jeremy Miles and Zoe Field. I would highly recommend this book to newbies in data science. To start with statistics, this book has a great content which goes in depth detail of its topics. Along with, the statistical concept are explained in conjunction with R which makes it even more useful. It offers a step by step understanding, with a parallel support of interesting practice examples.
Available: Buy on Amazon
Mathematics
Introduction to Linear Algebra
This is one of the most recommended book on Linear Algebra. The author of this book is Gilbert Strang, Professor, MIT. Gilbert unique way of delivering knowledge would give you the intuition and excitement to move forward after every chapter. This book will help you to build a strong mathematical foundation for machine learning. It enlists all the necessary chapters such as vectors, linear equations, determinants, eigenvalues, matrix factorization etc in great depth.
Available: Buy on Amazon
Matrix Computation
Matrix and Data frames are essential components of machine learning. The author of this book is Gene H Golub and Charles F Van Loan. This book provides a nice head start to students with concepts of matrix computations. The author covers most of the important topics such as gaussian elimination, matrix factorization, lancoz method, error analysis etc. Every chapter is supported by intuitive practice problems. The pseudo codes are available in Matlab.
Available: Free Download
A Probabilistic Theory of Pattern Recognition
This is a complete resource to learn application of mathematics. This is a must read book for intermediate and advanced practitioners in machine learning. This book is written by Luc Devroye, Laszlo Gyorfi and Gabor Lugosi. It covers a wide range of topics varying from bayes error, linear discrimination to epsilon entropy & neural networks. It provides a convincing explanation to complex theorems with section wise practice problems.
Available: Free Download
Introduction of Math of Neural Networks
If you have innate interest in learning about neural network, this should be your place to start. The author of this book is Jeff Heaton. The author has beautifully simplified the difficult concepts of neural networks. This book introduces you to basics of underlying maths in neural networks. It assumes reader has prior knowledge of algebra, calculus and programming. It demonstrates various mathematical tools which can be applied to neural networks.
Available: Buy on Amazon
Advanced Engineering Mathematics
This is probably the most comprehensive book available on mathematics for machine learning users. The author of this book is Erwin Kreyszig. As a matter of fact, this book is highly recommended to college students as well. If you haven’t been good at maths till now, follow this book religiously and you should surely see significant improvements in your math understanding. Along with derivations & practice example, this book has dedicated sections of calculus, algebra, probability etc. Definitely, a must read book for all levels of practitioners in data science.
Available: Free Download
Cookbook on Probability and Statistics
This cookbook is must have in your digital bookshelf. This isn’t exactly a text book you’d discover, but a quick digital guide on mathematical equations. The author of this book is Matthias Vallentin. After you finish with essentials of mathematics, this book will help you connect various theorem and algorithm quickly with their formulae. It’s difficult to derive equations instantly, this book will help you to quickly navigate to your desired problem and solve.
Available: Free Download
Additional Resources
Bored of reading too much ? Here are is a list of highly recommended tutorials (video) / resources on mathematics and statistics. They are FREE to access.
- Complete Course on Linear Algebra by MIT
- Complete Course on Multivariable Calculus by MIT
- Statistical Learning by Stanford University
- Mathematics at Khan Academy
- Full Cheatsheet on Probability
End Notes
The books listed in this article are selected on the basis of their reviews and depth of topics covered. This is not an exhaustive list of books. But, I found it’s almost too easy to get confused while deciding ‘from where to begin?’ In such situations, it is advisable to start with this list.
In this article, I’ve listed some most helpful books on statistics and machine learning. It has been found that people tend of neglect these topics in pursuit of quick success. But, that’s not the right way. Hence, if you aim for a long term success in data science, make sure you learn to create stories out of maths and statistics.
Have you read any of these books ? Which book on mathematics and statistics has helped you the most ? Please share your suggestions / reviews in the comments section below.
37 thoughts on "Free Must Read Books on Statistics & Mathematics for Data Science"
Kamal says: February 17, 2016 at 4:51 am
Wow! That's a long list of books. Hoping to thoroughly read one of these.Dr.D.K.Samuel says: February 17, 2016 at 4:57 am
Dear Manish Saraswat, thanks for a nice list. i have downloaded all, now I have to learn, understand and do (my progress is one cm a day!)Koushik says: February 17, 2016 at 9:23 am
Is there any ebook in the internet ?Emanuel Woiski says: February 17, 2016 at 11:20 am
Nice list! Allow me to add some more resources: 1- Find all Think Stats material (not just the 2nd edition of the book) from https://github.com/AllenDowney/ThinkStats2 . 2- Allen has also written Think Bayes, material accessible in https://github.com/AllenDowney/ThinkBayes. 2- Follow Chen's 16+ Free Data Science Books in http://www.wzchen.com/data-science-books/ Regards woiskiHenry says: February 17, 2016 at 12:02 pm
Wonderful starting point for someone that wants to be a data science, thanks for the list!Johnny says: February 17, 2016 at 12:37 pm
Thank you again! Now, to learn how to use statistics and math to peruse through such a long list of books and get out of them what I am longing to find - albeit, the things I do not even know yet exist :-)Analytics Vidhya Content Team says: February 17, 2016 at 1:00 pm
All the Best !Analytics Vidhya Content Team says: February 17, 2016 at 1:02 pm
If you really plan at completing this list, great! I'd suggest you to start with easy ones like linear algebra and introduction to statistical learning. These will ignite your appetite to learn more and take up more complex concepts. Wish you luck!Analytics Vidhya Content Team says: February 17, 2016 at 1:04 pm
Welcome! If you find data science & machine learning interesting, these books will surely enlighten your path! Wish you luck.Analytics Vidhya Content Team says: February 17, 2016 at 1:05 pm
Thanks Woiski! These are helpful too.Olumide Michael Oyalola says: February 17, 2016 at 2:24 pm
Hi Manish, thanks for sharing...Prof Vadlamani Ravi says: February 17, 2016 at 11:36 pm
Great! It is a useful and memorable service to the DS community. I ask my students to download them. Thanks to Emanuel too. BestsHighspirits says: February 18, 2016 at 3:39 am
Awesome!! I was trying to find the best place to start exploring Statistics for Data Science. Thanks a lot for the pointers!!Analytics Vidhya Content Team says: February 18, 2016 at 8:09 am
Welcome!Analytics Vidhya Content Team says: February 18, 2016 at 8:10 am
Welcome Sreedhar!Steve Rokette says: February 26, 2016 at 10:13 am
Thanks a lot!Kamal T says: February 29, 2016 at 5:06 am
Just learned that the first book can be done along with an online course by Stanford. Thought it would be helpful to mention it here :) https://lagunita.stanford.edu/courses/HumanitiesSciences/StatLearning/Winter2016/aboutDebarpita says: February 29, 2016 at 7:17 am
Hi, The list of books are really helpful. Can you have a post on complete path to learn (or rather brush up) all the maths / stats / probability from scratch. I have too much material and resources now, but don't know where to start from and how to progress. ThanksJavier Alonso says: February 29, 2016 at 7:13 pm
Thanks a lot for the references. They are a must read, and most of them Free! For those trying to learn about it's a great help...Peter says: March 01, 2016 at 6:41 pm
Wow. You have really compiled a list of useful books. Super! I had read think stats but the others were gems! Super, gonna read into more math, now. I also own a copy of Discovering Statistics using R. It is really a nice book. Thank you for the list. Best, PeterHermine says: March 02, 2016 at 11:28 am
Hi, I have the same concern. So if someone can guide us it will be great Thanks in advanceAnalytics Vidhya Content Team says: March 10, 2016 at 8:14 am
Hi Debarpita Here's what you are looking for: http://www.analyticsvidhya.com/blog/2016/02/free-read-books-statistics-mathematics-data-science/Debarpita Das Pal says: March 10, 2016 at 1:15 pm
Yes. i am also talking about the same list. But which book should I start with :)... and then how to progress ahead.Ravi says: March 18, 2016 at 11:42 am
Hi All, I am much interested in Big data Programme and its implemented in my organization. Seek to learn this upcoming technology. As I have almost 3.6 years exp as database developer and much interest in data related works. . But my background qualification is B.A general then 3 years Diploma in (Software engineering) , Microsoft certification in .Net Development. But moving forward i heard that the programme is covered only for who have engineering degree, Mathematics, Economics, Statics. Hence pl humble request to all pl suggest on same as future perspective. Thanks , RaviYash says: March 27, 2016 at 6:34 pm
Hi, I wanted to ask you a few questions. Can I message u personally?Analytics Vidhya Content Team says: March 28, 2016 at 9:31 am
Hello Yash You can ask me questions at discuss.analyticsvidhya.com. Do tag me.Analytics Vidhya Content Team says: March 28, 2016 at 9:36 am
Hi Ravi I understand your trouble and would like to help you in this regards. Your education qualification is not at all a problem considering your experience in IT. May be, your organization might have put a selection filter but other big data courses don't do that. Here is a list of top certifications in big data in 2016 - http://www.analyticsvidhya.com/blog/2016/01/top-certification-courses-sas-r-python-machine-learning-big-data-spark-2015-16/#seven Completing any of these certification will surely add to your skills and job possibilities.Analytics Vidhya Content Team says: March 28, 2016 at 9:37 am
Start with Introduction to Statistical Learning.Suresh says: March 31, 2016 at 7:22 pm
Hi, This is Suresh, I have completed B.sc Zoology and passed out 2006. Now I am working in BPO sector. Now I want to change my domain so I selected for Business analytics course. Please suggest me, I am eligible for this course and can I get the job also. If I want to this course what I need too do? Please advise me. I went and asked one of the Analytics Institute they told me, I am not eligible for this course and suppose if I complete the course No one take it for Business analytics jobs they told me like that.Madhav says: May 03, 2016 at 3:45 pm
Awesome list, Manish. The first two books on statistical learning are goldmines of knowledge. The book by Devroye et al looks extremely rigorous and challenging although it could be used by anyone with [really] strong mathematical foundations in real analysis, (measure theoretic) probability and related subjects. Thanks for recommending this book, it may probably be the next book on my "To-Study" list. Would you mind if I recommend a few foundations books so that people who wanna study these books have a good background?sumendar says: May 08, 2016 at 4:34 pm
Hi Manish, Nice collection, one more best book which i can suggest for data science newbies is "An introduction to Data Science" by Jeffrey Stanton, Syracuse University & Robert W. De Graaf. i hope we can learn basic Statistics and R programming at a time with this book.Kapil says: May 09, 2016 at 2:31 pm
Hello Manish. I am working on geotechnical instrument data (just excel plotting) as a database engineer since last 4 years. I am from B.E (ECE) background. and interested in data science. Plz suggest is this beneficial to me for carriar growth?Mohamed says: June 14, 2016 at 5:42 am
thanksSudarshan Bala says: June 20, 2016 at 12:23 pm
Hi.. I'm a chartered accountant. All I have done is business maths and college level statistics. I aspire to become a data scientist. I need encouragement to take up this profession. First, I need to hone my quants skills! Are these books enough to secure a entry level job position, say an Analyst/Consultant in a top firm?Somasundaram says: September 17, 2016 at 8:20 am
Hi Manish, Thanks for this useful info. I started reading "Introduction to Statistical Learning", which referred to "Elements of Statistical Learning" [ ESL] 21st page of ESL refers to a table of spam/email data. But i cannot understand what data it refers to, Can you pls guide me to infer that. I have seen the supporting data @http://web.stanford.edu/~hastie/ElemStatLearn/datasets/spam.info.txt [ which presents all the observations, average of each variables etc] But still i did not understand the table. Pls help me to understand this.PFB the table. "TABLE 1.1. Average percentage of words or characters in an email message equal to the indicated word or character. We have chosen the words and characters showing the largest difference between spam and email. george you your hp free hpl ! our re edu remove spam 0.00 2.26 1.38 0.02 0.52 0.01 0.51 0.51 0.13 0.01 0.28 email 1.27 1.27 0.44 0.90 0.07 0.43 0.11 0.18 0.42 0.29 0.01" Thanks, Somu.Ratnadeep Bhattacharya says: December 19, 2016 at 6:03 am
I noticed that the Kreyszig book is mentioned which is a standard in engineering degrees at least in India. Another cheap option, and the books are extremely good, are from N. E. Piskunov. However, the coverage is a little tougher. Here are some other books (free) in case you want to dive into individual topics: Single and Multivariable Calculus https://www.whitman.edu/mathematics/multivariable/multivariable.pdf Linear Algebra https://www.math.ucdavis.edu/~linear/linear-guest.pdf Ordinary Differential Equations http://www.mat.univie.ac.at/~gerald/ftp/book-ode/ode.pdf Partial Differential Equations http://www.math.uni-leipzig.de/~miersemann/pdebook.pdf Complex Analysis http://math.sfsu.edu/beck/papers/complex.pdf Optimization/Numerical Optimization http://home.agh.edu.pl/~pba/pdfdoc/Numerical_Optimization.pdf Statistics http://math.arizona.edu/~jwatkins/statbook.pdf Bayesian Statistics http://stat.smmu.edu.cn/download/ebook/introduction%20to%20bayesian%20statistics.pdf Signals and Systems http://www.eng.ucy.ac.cy/cpitris/courses/ece623/notes/SignalsAndSystems.pdf I included a book on signals and systems because I think Fourier analysis is also very important in data science/machine learning. Some great courses are mentioned in: http://www.kdnuggets.com/2015/09/15-math-mooc-data-science.html/2Venkat says: March 26, 2018 at 11:07 am
Such an awesome book https://github.com/AllenDowney/ThinkStats2 . And I Suggest this.