5 Best Machine Learning GitHub Repositories & Reddit Discussions (November 2018)

Last Updated : 06 May, 2019

6 min read

Introduction

Coding is among one of the best things about being a data scientist. There are often days when I find myself immersed in programming something from scratch. That exhilarating feeling you get when you see your hard work culminate in a successful model? Exhilarating and unparalleled!

But as a data scientist (or a programmer), its equally important to create checkpoints of your code at various intervals. It’s incredibly helpful to know where you started off from last time so if you have to rollback your code or simply branch out to a different path, there’s always a fallback option. And that’s why GitHub is such an excellent platform.

The previous posts in this monthly series have expounded on why every data scientist should have an active GitHub account. Whether it’s for collaboration, resume/portfolio, or educational purposes, it’s simply the best place to enhance your coding skills and knowledge.

And now let’s get to the core of our article – machine learning code! I have picked out some really interesting repositories which I feel every data scientist should try out on their own.

Apart from coding, there are tons of aspects associated with being a data scientist. We need to be aware of all the latest developments in the community, what other machine learning professionals and thought leaders are talking about, what are the moral implications of working on a controversial project, etc. That is what I aim to bring out in the Reddit discussion threads I showcase every month.

To make things easier for you, here’s the entire collection so far of the top GitHub repositories and Reddit discussions (from April onwards) we have covered each month:

GitHub Repositories

Open AI’s Deep Reinforcement Learning Resource

Keeping our run going of including reinforcement learning resources in this series, here’s one of the best so far – OpenAI’s Spinning Up! This is an educational resource open sourced with the aim of making it easier to learn deep RL. Given how complex it can appear to most folks, this is quite a welcome repository.

The repo contains a few handy resources:

An introduction to RL terminology, kinds of algorithms, and basic theory
An essay about how to grow into an RL research role
A curated list of important papers organized by topic
A code repo of short, standalone implementations of key algorithms
A few exercises to get your hands dirty

NVIDIA’s WaveGlow

This one is for all the audio/speech processing people out there. WaveGlow is a flow-based generative network for speech synthesis. In other words, it’s a network (yes, a single network!) that can generate impressive high quality speech from mel-spectrograms.

This repo contains the PyTorch implementation of WaveGlow and a pre-trained model to get you started. It’s a really cool framework, and you can check out the below links as well if you wish to delve deeper:

BERT as a Service

We covered the PyTorch implmentation of BERT in last month’s article, and here’s a different take on it. For those who are new to BERT, it stands for Bidirectional Encoder Representations from Transformers. It’s basically a method for pre-training language representations.

BERT has set the NLP world ablaze with it’s results, and the folks at Google have been kind enough to release quite a few pre-trained models to get you on your way.

This repository “uses BERT as the sentence encoder and hosts it as a service via ZeroMQ, allowing you to map sentences into fixed-length representations in just two lines of code”. It’s easy to use, extremely quick, and scales smoothly. Try it out!

Python Implementation of Google’s ‘Quick Draw’ Game

Quick, Draw is a popular online game developed by Google where a neural network tries to guess what you’re drawing. The neural network learns from each drawing, hence increasing it’s already impressive ability to correctly guess the doodle. The developers have built up a HUGE dataset from the amount of drawings users have made previously. It’s an open-source dataset which you can check out here.

And now you can build your own Quick, Draw game in Python with this repository. There is a step-by-step explanation of how to do this. Using this code, you can run an app to either draw in front of the computer’s webcam, or on a canvas.

Visualizing and Understanding GANs

GAN Dissection, pioneered by researchers at MIT’s Computer Science & Artificial Intelligence Laboratory, is a unique way of visualizing and understanding the neurons of Generative Adversarial Networks (GANs). But it isn’t just limited to that – the researchers have also created GANPaint to showcase how GAN Dissection works.

This helps you explore what a particular GAN model has learned by inspecting and manipulating it’s internal neurons. Check out the research paper here and the below video demo, and then head straight to the GitHub repository to dive straight into the code!

Reddit Discussions

Why Gradient Descent is Even Needed in the First Place

Has this question ever crossed your mind while learning basic machine learning concepts? This is one of the fundamental algorithms we come across in our initial learning days and has proven to be quite effective in ML competitions as well. But once you start going through this thread, prepare to seriously question what you’ve studied previously.

What started off as a straight forward question turned into a full-blown discussion among the top minds on Reddit. I thoroughly enjoyed browsing through the comments and I’m sure anyone with interest in this field (and mathematical rigour) will find it useful.

Reverse Engineering a Massive Neural Network

What do you do when the developer of a complex and massive neural network vanishes without leaving behind the documentation needed to understand it? This isn’t a fictional plot, but a rather common situation the original poster of the thread found himself in.

It’s a situation that happens regularly with developers but takes on a whole new level of intrigue when it comes to deep learning. This thread explores the different ways a data scientist can go about examining how a deep neural network model was initially designed. The responses range from practical to absurd, but each adds a layer of perspective which could help you one day if you ever face this predicament.

Debate on TensorFlow 2.0 API

My attention to this thread was drawn by the sheer number of comments (110 at the time of writing) – what in the world could be so controversial about this topic? But when you started scrolling down, the sheer difference in opinions among the debators is mind boggling. Apart from TensorFlow being derided for being “not the best framework”, there’s a lot of love being shown to PyTorch (which isn’t all that surprising if you’ve used PyTorch).

It all started when Francois Chollet posted his thoughts on GitHub and lit a (metaphorical) fire under the machine learning community.

Reinforcement Learning with Prediction-Based Rewards

Another OpenAI entry in this post – and yet another huge breakthrough by them. The title might not leap out of the page as anything special but it’s important to understand what the OpenAI team have conjured up here. As one of the Redditors pointed out, this takes us one step closer to machines mimicking human behavior.

It took around a year of total experience to beat the Montezuma’s Revenge game at a super human level – pretty impressive!

Landing that First Data Scientist Job

This one is for all the aspiring data scientists reading the article. The author of the thread expounds on how he landed the coveted job, his background, where he studied data science from, etc. After answering these standard questions, he has actually written a very nice post on what others in a similar position can do to further their ambitions.

There are some helpful comments as well if you scroll down a little bit. And of course, you can post your own question(s) to the author there.

End Notes

Quite a collection this month. I found the GAN Dissection repository quite absorbing. I’m currently in the process of trying to replicate it on my own machine – should be quite the ride. I’m also keeping an eye on the ‘Reverse Engineering a Massive Neural Network’ thread as the ideas spawning there could be really helpful in case I ever find myself in that situation.

Which GitHub repository and Reddit thread stood out for you? Which one will you tackle first? Let me know in the comments section below!

Beginner Data Science Deep Learning Github Listicle

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Reading list

Introduction to Deep Learning

Feed Forward Networks

Feed Forward Networks

Gradient Descent

Loss Function

Activation Functions

Introduction to Neural networks

Forward and Backward Propagation

Optimizers

Learning Rate Schedulers

NN on Structured Data

Improving the Deep Learning Model

Deep Learning Model Optimization

Unsupervised Deep Learning

AutoDL

Model Deployment

Introduction to PyTorch

5 Best Machine Learning GitHub Repositories & Reddit Discussions (November 2018)

Introduction

GitHub Repositories

Reddit Discussions

End Notes

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit

li_at

s_plt

lang

s_tp

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

s_pltp

s_tslv

li_theme

li_theme_set

Google (11)

_gcl_au

SID

SAPISID

__Secure-#