Learn everything about Analytics

Top 7 Data Science & Machine Learning GitHub Repositories in March 2018

SHARE
, / 11

Introduction

I live GitHub! Not only can you follow the work happening in different domains, but you can also collaborate on multiple open source projects. All tech companies, from Google to Facebook, upload their open source project codes on GitHub so the wider coding / ML community can benefit from it.

But, if you are too busy, or find following GitHub difficult, we bring you a summary of top repositories month on month. You can keep yourself updated with the latest breakthroughs and even replicate the code on your own machine!

This month’s list includes some awesome libraries. From Google Brain’s AstroNet to an artificial neural network visualizer, we have curated a list of unique repositories that will expand your machine learning horizons.

Are you ready? Let’s look at last month’s top 7 then!

You can check out the top 5 repositories that we picked out in January here and February here.

 

Person Blocker

‘Person Blocker’ is a python library that automatically blocks out entire people in images using a pre-trained neural network. The algorithm uses Mask R-CNN that is pre-trained on the MS COCO dataset. And the cherry on top? No GPU required!

And not just people, the algorithm is able to block out entire objects as well. The algorithm recognizes 80 different types of objects, including vehicles, animals, electronic gadgets, among other things.

You can read more about this library on Analytics Vidhya’s blog here.

 

AstroNet

                                                                                         Source: Yahoo

Back in December 2017, the Google Brain team revealed it had discovered 2 new planets by applying Astronet – it’s deep neural network model for working on astronomical data. It was a monumental discovery that went to show the far-reaching impacts of machine learning in today’s world.

Now, Google Brain has released the entire code that went into making that technology and they’ve made it available for everyone. The model is based on a convolutional neural network (CNN).

We have you covered on this AVBytes article regarding AstroNet.

 

ANN Visualizer

ANN Visualizer is a python library that enables us to visualize an Artificial Neural Network using just a single line of code. It is used to work with Keras and makes use of python’s ‘graphviz’ library to create a neat and presentable graph of the neural network you’re building.

Check out Analytics Vidhya’s detailed coverage of this awesome library here.

 

Fast Pandas

Any python novice will tell you how flexible and powerful the pandas library is. Being a data scientist, you need to be equally flexible and think of different ways to approach a problem. The ‘Fast Pandas’ repository aims to benchmark the different available methods in such situations.

This is a very useful library and one we highly recommend trying out at least once.

 

TensorFlow.js

TensorFlow.js is an open-source library that you can use to train and build machine learning models in your web browser, using JavaScript and APIs. If you’re familiar with Keras, the high level layers API will seem very familiar to you.

It’s available with GPU acceleration and also automatically supports WebGL. You can import existing pre-trained models and also re-train entire existing ML models within your web browser.

Check out our coverage of this here.

 

Caffe64

Caffe64 is a simple, small yet incredibly functional neural network library. We all know how onerous it is to install a neural network library. According to the developers, Caffe64 ditches all the hard work and is the “easiest to compile and most lightweight neural network library, period“.

If you’ve used caffe before, this will be a piece of cake for you!

 

TensorFlow Hub

TensorFlow Hub is a library to foster the publication, discovery, and consumption of reusable parts of machine learning models. In particular, it provides modules, which are pre-trained pieces of TensorFlow models that can be reused on new tasks. By reusing a module on a related task, you can:

  • train a model with a smaller dataset
  • improve generalization
  • significantly speed up training

 

Have you used any of these libraries before? How was your experience? Let us know in the comments section below!

 

Participate in the McKinsey Analytics Online Hackathon to win an all-expenses paid trip to an international analytics conference!

11 Comments

  • Frank francisco says:

    Check out etherscan ml for a solid blockchain machine learning repo built on ethereum. Not related but a big fan.

  • Data Science Training In Pune says:

    In this article shows multiple domains…plz moreover information about data science.

  • Sanil says:

    Analytics Vidhya is doing great job of making this information easily available. Requesting to post more R related stuff too. Thanks.

  • Jacob says:

    How did you select these as the “top 5”? Is this data driven and if so how precisely is it data driven?

    • Pranav Dar says:

      Hi Jacob,

      There are a few factors that go into selecting the top GitHub repositories each month but the primary one is that it should benefit our data science and machine learning community. Then we look at what language was used, what real world cases or uses there are, etc.

  • Rahul says:

    Is there python library available to analyse high-dimensional hyperspectral data? I know about spectral-python, but it is not that good.

    • Pranav Dar says:

      Hi Rahul,

      I am also aware of the ‘spectral-python’ library only. You can try ‘t-SNE’ for python in case that is of any use in this area.

  • Don Carpenter says:

    This is garbage. Sorry but you say it yourself, Person blocker is JUST MaskRCNN with COCO and a filter. It achieves nothing, brings nothing new and is honestly useless as-is.

    • Pranav Dar says:

      Hi Don,

      Thanks for your feedback. This can be used by someone in the image processing industry, to blur out images, sensor things in a video, etc. Of course it will take fine tuning to make it industry ready but this lays the groundwork for it.

      When we curate this list, we look at multiple factors like how many stars does the repository have and applications in the field, among other things. The idea is that readers can understand what’s trending, and can replicate the code on their own machines. They can improve on the code, understand how it works, and learn from it, regardless of the background they are coming from.