Top 5 Data Science & Machine Learning Repositories on GitHub in Feb 2018

Pranav Dar 05 Jun, 2020

3 min read

Introduction

Continuing our theme of collecting and sharing the top machine learning GitHub repositories every month, the February edition is fresh off the shelves ready for you!

GitHub repositories are one of the easiest and best things for all the people working in data science to keep ourselves updated with the latest developments and projects. It’s also an awesome collaboration tool where we can connect with other like minded data scientists on various projects.

Without any further ado, let’s dive into this month’s list.

This is part of a series from Analytics Vidhya that will run every month. You can check out the top 5 repositories that we picked out in January here.

FastPhotoStyle

FastPhotoStyle is a python library developed by NVIDIA. The model takes a content photo and a style photo as inputs. It then transfers the style of the style photo to the content photo.

The developers have cited two examples to show how the algorithm works. The first is a very simple iteration – you download a content and a style image, re-size them, and then simply run the photorealistic image stylization code. In the second example, semantic label maps are used to create the stylized image.

You can read more about this library on Analytics Vidhya’s blog here.

Twitter Scraper

If you’ve ever scraped tweets from Twitter, you have experience working with it’s API. It has it’s limitations and is not easy to work with. This python library was created with that in mind – it has no API rate limits (does not require authentication), no limitations, and is ultra quick. You can use this library to scrape the tweets of any user trivially

The developer has mentioned that it can be used for making Markov Chains. Do note that it works only with python version 3.6+.

Handwriting Synthesis

This is an implementation of the handwriting synthesis experiments presented in the ‘Generating Sequences with Recurrent Neural Networks’ paper by Alex Graves. As the name of the repository suggests, you can generate different styles of handwriting. The model is based on priming and biasing. Priming controls the style of the samples and biasing controls the neatness of the samples.

The samples presented by the author on the GitHub page are truly fascinating in their diversity. He is looking for contributors to enhance the repository so if you’re interested, get in touch with him!

ENAS PyTorch

This is a PyTorch implementation of “Efficient Neural Architecture Search (ENAS) via Parameters Sharing”. What do ENAS do? They reduce the computational requirement, that is, the GPU Hours of the Neural Architecture Search by an incredible 1000 times. They do this via parameter sharing between models that are subgraphs within a large computational graph.

The process of how to use it have been neatly explained on the GitHub page. The prerequisites for implementing this library are:

Python 3.6+
PyTorch
tqdm, imageio, graphviz, tqdm, tensorboardX

Sign Language

Source: Wikipedia

This is a relatively straightforward, yet utterly fascinating, use of machine learning. Using a convolutional neural network in python, the developer has built a model that can recognize the hand gestures and convert it into text on the machine.

The author of this repository built the CNN model using both TensorFlow and Keras. He has specified, in detail, how he went about creating this project and each step he followed. It’s definitely worth checking out and trying once on your own machine.

Did you find these helpful? Or are you aware of any other GitHub repositories the AV community should know about? Let us know in the comments section below!

Pranav Dar 05 Jun, 2020

Senior Editor at Analytics Vidhya. Data visualization practitioner who loves reading and delving deeper into the data science and machine learning arts. Always looking for new ways to improve processes using ML and AI.

Data Science Deep Learning Github Intermediate Listicle

Frequently Asked Questions

Responses From Readers

raymond doctor 15 Mar, 2018

Hi, I need a simple prediction tool using CNN /Tensorflow+Back propagation which will allow me to train data . My data at present is Sindhi written in Arabic script and mapped to Devanagari script. A small sample شرِڙاٽُ=शरिड़ाटु شرڌانجليِ=श्रद्धांजली شرڙاٽُ=शरड़ाटु شرڻارٿيِ=शरणार्थी شسترشالا=शस्त्रशाला شسترهيڻُ=शस्त्रहीणु شسترُ=शस्त्रु ششماهيِ=शशमाही ششُ=शिषु I have around 300,000 samples At present I am writing rules to handle this, but am sure that a tool in Python can solve this. Any pointers to such a tool will be most welcome. Thanks in advance.

2

Show 2 reply

Faizan Shaikh 19 Mar, 2018

Hi Raymond, You would have to build a machine translation model from scratch for this data, as I don't think you would find pretrained models for a similar problem. You can refer this article for pointers

VB 21 Mar, 2018

I came across a white paper once which have been implemented on arabic text recognition. You have to google that and I think it claims to be powerful. If you are lucky, you might find github repo for that implementation.