Learn everything about Analytics

Top Data Scientists to Follow & Best Data Science Tutorials on GitHub


Twitter started the trend of ‘People to Follow’. This later got replicated by other platforms such as Facebook, Linkedin, Quora and GitHub. This cool feature lets you connect with the rockstars of various domains and get an access to what is going on their end without bothering them much. For the influencers, this has become an effective way to communicate with their followers.

The lives of people on GitHub doesn’t appear to as tempting as you would observe on other platforms, but if you love coding, programming and data science, you’ll surely enjoy the company of 9 million users on this platform!

Following influencers is usually a good practice. It has helped me in multiple ways:

  1. Whenever I run out of inspiration, I look at these influencers and see what they have achieved. This brings back the energy and I am back on my projects
  2. You can follow these influencers to see which events are they attending, what are they reading and what are they working on. This can quickly become a wealth of knowledge in itself.
  3. To some extent, it also provides a human touch to these influencers. By just looking at their profiles, they might come across as some one out of the world. But, when you start folllowing them regularly, you tend to relate yourself with the influencers.

If you haven’t tried this yet, it’s your turn now. I have compiled a list of some awesome Data Scientists on GitHub. In addition, you’ll also find the list of best data science tutorials available on GitHub. Specially for beginners, if don’t know about GitHub, here’s the quick introduction in simple words.





What is GitHub?

You can best understand GitHub as a social network for coders across the globe. Coders across the world can share their codes and work in collaborative manner using GitHub. GitHub started in 2008 and is a web based platform which provides online project hosting using Git.

Git is a version control system which helps you save various versions of your project in the original form and allows you to retrieve them later without any problem. Git was created by Linus Torvalds (he also created Linux) and has been a boon to programmers across the world. It is a free, open source platform, where programmers from all over the world can save, display their codes. GitHub has not only made procuring codes an easy task, but have also rendered immense support to the programmers, coders worldwide.

To be honest, it is difficult to imagine the programming world without Git today!


4 Amazing Facts About GitHub

  1. GitHub Inc. was originally known as Logical Awesome
  2. Github reached to 1 million repositories in 2010, 2 million in 2011 and 10 million repositories in 2013. As of 2015, it has over 21 million repositories and 9 million users.
  3. Approximately, two-thirds of the employees at GitHub work remotely.
  4. Majority of the coders community on GitHub is of Ruby, JavaScript, PHP and Python.


Data Science Tutorials on GitHub

Now, if you are new to GitHub, you would be asking, where do tutorials come in on a platform meant for version control and sharing of codes. Well, because of its niche community, a lot of people have started creating resource repositories on GitHub.  Essentially, since the programmers spend a lot of time on GitHub, why not create list of resources they use regularly.

Here’s a compiled list of tutorials on various topics in data science. These resources can be very handy. I suggest you to bookmark these (or watch these on GitHub).


1. Getting started with Data Science

Awesome Data Science: This is an awesome repository if you are to begin with Data Science. Here you’ll find every step that you need to take till the end of your journey.

Data Science Resources: This is another repository of data science tutorials to help you conquer this skill set. You can free to choose any of these, both are equally good.

Text Books in Data Science: If you like to read and refer to books, here is a compiled list of  best books on machine learning, data mining, statistics, data visualization etc.


2. Algorithms

Data Science Algorithms: Here’s a comprehensive overview & explanation of algorithms such as Linear Regression, Logistic Regression, K-Mean Clustering, Random Forest. You’ll also find their worksheets for practice.

Statistics and ML:  Here’s a list of tutorials to become efficient in your day to day programming. It covers python pandas, machine learning algorithms, statistics and data visualization


3. Machine Learning

Scikit Learn: Scikit learn is a python library for machine learning. This repository has everything to offer to help you learn about machine learning in Python. ( Hint: Dig Deeper )

Awesome Machine Learning: Here is an ultimate list of tutorials, resources, guides for machine learning, data analysis, natural language processing, data visualization in all the programming languages like Python, R, Java, Go, C++, Swift. Choose accordingly.

Complete Machine Learning: Here’s a collection of  tutorials and examples for solving problems using machine learning. It consist of beginning to end steps of ML covering stages such as model evaluation, implementation of ML algorithms, data visualization etc.

Parallel Machine Learning: This tutorial is on using scikit learn and ipython for parallel machine learning. Here you’ll find a 2 hours long video from Pycon 2013 with lecture notes and other useful resources.

Machine Learning Courses: Here’s a list of Best Machine Learning Courses in the world.


4. Deep Learning

Caffe: Caffe is a deep learning framework made with expression, speed, and modularity in mind. This repository consist of installation instructions and other recommended tutorials to help you learn this framework properly.

Awesome Deep Learning: Here’s a curated list of tutorials on Deep Learning which includes deep learning courses, free books, videos and lectures, papers and other useful resources to follow.

Deep Learning in Python: Here’s a complete tutorial on implementation of Deep Learning in Python

Deep Learning in Julia:  Mocha is a Deep Learning framework for Julia. This tutorial follows a step by step methodology to be able to introduce this framework in the best possible manner.

Recurrent Neural Networks: Here’s a awesome list of dedicated resources for RNN. If you have longed to curate the resources for RNN, you’ve like to stop here and take a glance. This guide consists of codes, lectures, books and resources on multiple applications of RNN.



Top 30 Data Scientists to Follow on GitHub

Here’s is a compiled list of most influential data scientists on Github to follow. These data scientists are experts in their respective field which ranges from python, machine learning, neural nets, data visualization, deep learning, data science etc.

1. Sebastian Raschka        (Machine Learning, Data Visualization)

2. Randy Olson                  (Python – Data Analysis, Matplotlib, Bokeh)

3. Hilary Mason                (Chief Data Scientists at Bitly)

4. Mike Bostock                (D3, Data Visualisation)

5. Prakhar Srivastav        (Python, Algorithms)

6. Andreas Mueller          (Machine Learning, Python)

7. Wes McKinney             (Author of Python for Data Analysis)

8. Jake Vanderplas          (Machine Learning, Data Visualization)

9. Mathieu Blondel          (Machine Learning, Neural Networks)

10. Gael Varoquaux         (Machine Learning, Statistics, Python)

11. Oliver Grisel                (Machine Learning, Deep Learning)

12. Andrej                          (Deep Learning, Neural Network, SVM)

13. Micheal Nielsen         (Neural Networks, Deep Learning)

14. Heather Arthur          (Neural Network, Javascript)

15. Allen Downey             (Python, Algorithms)

16. Davies Liu                   (Apache Spark, Python)

17. Julia Evans                 (Machine Learning, Python)

18. Jeff L                            (R Programming, Data Analysis)

19. John Myles White     (Julia, Machine Learning)

20. Thomas Wiecki         (Python, Bayesian Analysis)

21. Brian Caffo                  (John Hopkins University)

22. Roger D Peng             (John Hopkins University)

23. Stefan Karpinski       (Julia)

24. Pete Skomoroch        (Machine Learning, Big Data, Python)

25. Mike Dewar               (Python, D3, Javascript)

26. Hadley Wickham     (Statistics, Data Analysis, Data Visualisation)

27. Romain Francois      (R Programming)

28. Justin Palmer           (D3, Data Visualisation)

29. Jason Davies             (D3, Data Visualization)

30. Cameron Davidson Pilon      (Python, Algorithms)


End Notes

GitHub is not just about coding and sharing codes. Its utility extends to connecting with experts and learn from them. The intent behind writing this article is to give you an overview of GitHub and its uses.

In this article, I have displayed the list of top 30 data scientists to follow on GitHub. I have also list down some of the best tutorials I felt are awesome. I hope these repositories turns out to be useful for you!

If you think, I’ve missed out on any useful tutorial or data scientist, feel free to add them in the comments section below.

If you like what you just read & want to continue your analytics learning, subscribe to our emailsfollow us on twitter or like our facebook page.

You can also read this article on Analytics Vidhya's Android APP Get it on Google Play
This article is quite old and you might not get a prompt response from the author. We request you to post this comment on Analytics Vidhya's Discussion portal to get your queries resolved