Learn everything about Analytics

Home » Top 5 Data Science & Machine Learning Repositories on GitHub in Feb 2018

Top 5 Data Science & Machine Learning Repositories on GitHub in Feb 2018


Continuing our theme of collecting and sharing the top machine learning GitHub repositories every month, the February edition is fresh off the shelves ready for you!

GitHub repositories are one of the easiest and best things for all the people working in data science to keep ourselves updated with the latest developments and projects. It’s also an awesome collaboration tool where we can connect with other like minded data scientists on various projects.

Without any further ado, let’s dive into this month’s list.

This is part of a series from Analytics Vidhya that will run every month. You can check out the top 5 repositories that we picked out in January here.



FastPhotoStyle is a python library developed by NVIDIA. The model takes a content photo and a style photo as inputs. It then transfers the style of the style photo to the content photo.

The developers have cited two examples to show how the algorithm works. The first is a very simple iteration – you download a content and a style image, re-size them, and then simply run the photorealistic image stylization code. In the second example, semantic label maps are used to create the stylized image.

You can read more about this library on Analytics Vidhya’s blog here.


Twitter Scraper

If you’ve ever scraped tweets from Twitter, you have experience working with it’s API. It has it’s limitations and is not easy to work with. This python library was created with that in mind – it has no API rate limits (does not require authentication), no limitations, and is ultra quick. You can use this library to scrape the tweets of any user trivially

The developer has mentioned that it can be used for making Markov Chains. Do note that it works only with python version 3.6+.


Handwriting Synthesis

This is an implementation of the handwriting synthesis experiments presented in the ‘Generating Sequences with Recurrent Neural Networks’ paper by Alex Graves. As the name of the repository suggests, you can generate different styles of handwriting. The model is based on priming and biasing. Priming controls the style of the samples and biasing controls the neatness of the samples.

The samples presented by the author on the GitHub page are truly fascinating in their diversity. He is looking for contributors to enhance the repository so if you’re interested, get in touch with him!


ENAS PyTorch

This is a PyTorch implementation of “Efficient Neural Architecture Search (ENAS) via Parameters Sharing”. What do ENAS do? They reduce the computational requirement, that is, the GPU Hours of the Neural Architecture Search by an incredible 1000 times. They do this via parameter sharing between models that are subgraphs within a large computational graph.

The process of how to use it have been neatly explained on the GitHub page. The prerequisites for implementing this library are:

  • Python 3.6+
  • PyTorch
  • tqdm, imageio, graphviz, tqdm, tensorboardX


Sign Language

                                                     Source: Wikipedia

This is a relatively straightforward, yet utterly fascinating, use of machine learning. Using a convolutional neural network in python, the developer has built a model that can recognize the hand gestures and convert it into text on the machine.

The author of this repository built the CNN model using both TensorFlow and Keras. He has specified, in detail, how he went about creating this project and each step he followed. It’s definitely worth checking out and trying once on your own machine.


Did you find these helpful? Or are you aware of any other GitHub repositories the AV community should know about? Let us know in the comments section below!


You can also read this article on our Mobile APP Get it on Google Play
This article is quite old and you might not get a prompt response from the author. We request you to post this comment on Analytics Vidhya's Discussion portal to get your queries resolved


  • raymond doctor says:

    I need a simple prediction tool using CNN /Tensorflow+Back propagation which will allow me to train data . My data at present is Sindhi written in Arabic script and mapped to Devanagari script.
    A small sample
    I have around 300,000 samples
    At present I am writing rules to handle this, but am sure that a tool in Python can solve this. Any pointers to such a tool will be most welcome. Thanks in advance.

    • Faizan Shaikh says:

      Hi Raymond,

      You would have to build a machine translation model from scratch for this data, as I don’t think you would find pretrained models for a similar problem.

      You can refer this article for pointers

    • VB says:

      I came across a white paper once which have been implemented on arabic text recognition. You have to google that and I think it claims to be powerful. If you are lucky, you might find github repo for that implementation.