Top 5 GitHub Repositories and Reddit Discussions for Data Science & Machine Learning (April 2018)
GitHub and Reddit are two of the most popular platforms when it comes to data science and machine learning. The former is an awesome tool for sharing and collaborating on codes and projects while the latter is the best platform out there for engaging with data science enthusiasts from around the world.
This year, we have covered the top GitHub repositories each month and from this month onwards, we will be including the top Reddit threads as well that generated the most interesting and intriguing discussions in the machine learning space.
April saw some amazing python libraries being open sourced. From Deep Painterly Harmonization, a library that makes manipulated images look ultra realistic, to Swift for TensorFlow, this article covers the best from last month.
Let’s look at April’s top repositories and most interesting Reddit discussions.
You can check out the top GitHub repositories for the last three months below:
The task of manipulating images and still making them look realistic has been around for ages. But with deep learning, this is becoming far more efficient and remarkably life-like. A developer has come up with an algorithm that takes a painting, adds an external element to it, and harmonizes it to make it look almost undistinguishable from the original painting.
Just look at the above image – the third frame is the final output and if we didn’t have the preceding two images, we would never be able to tell the balloon is an external object! This algorithm produces far more precise results than photo compositing or global stylization techniques and it achieves levels of edits that have so far been very difficult to achieve.
You can read more about this library on AVBytes here.
Swift for TensorFlow was demo’d at the TensorFlow Developer Summit last month and the team behind the technology has now open sourced the code on GitHub for the entire community. Their aim is to provide a new interface to TensorFlow that will build on it’s already awesome capabilities, while taking it’s usability to a whole new level.
This is still in it’s very nascent stages so it isn’t ready to be written into deep learning models yet. The team admits that the goals it has in mind while launching this are still a while away from being achieved. But there is a lot of potential here that is as yet untapped.
We have covered Swift for TensorFlow here for your reference.
A team of researchers from Cornell University have proposed a Multimodal Unsupervised Image-to-image Translation (MUNIT) framework for translating images from one domain to another. The aim is to take an image and generate a new image from it that is from a new category (for instance, transforming an image of a dog to a cat).
The previously existing approaches are able to perform only one-to-one mapping of the given an image and thus fail to produce diverse outputs of the same. MUNIT, on the other hand is able to provide more than one output. Exciting times!
We covered this on AVBytes and you can read about how it works here.
Deep Learning in the field of Natural Language Processing has taken off in a big way recently. There is a plethora of text available on the internet, dating back to centuries! GluonNLP is a toolkit that aims to make NLP tasks easier for a data scientist. It makes text preprocessing easier, along with loading the dataset(s) and building the deep learning neural models. This enables you to to do your NLP research faster and in a more efficient manner.
This repository has a nice documentation, along with a detailed example of how to use the library. They even have a nicely packaged 60-minute crash course for folks who are new to Gluon.
This repository is a goldmine. It’s a collection of PyTorch implementations of GANs (or Generative Adversarial Networks) that have been presented in research papers. Currently the repository lists 24 different implementations, each adding to your knowledge in its unique way. The list contains implementations like Adversarial Autoencoders, CycleGAN, Least Squares GAN, Pix2Pix, etc.
If you’re having trouble trying to understand any research paper, the Reddit machine learning community is willing to help you out. This is an awesome idea that has already helped a bunch of people in extracting valuable information where before they used to give up and move on.
But when you post there, ensure you provide as much detail as you can, like a summary of the paper, where you are stuck, what research have you done to find out by yourself, etc. This line sums it up well – “Think of each paper as an invite to an open study group for that paper, not just a queue for an expert to come along and answer it.”
The debate about whether research should be open sourced or closed has been raging on for decades. Recently, the popular Nature magazine announced it’ll be publishing a closed-access journal. This has led to a major campaign against them, with a lot of big names (Jeff Dean, Ian Goodfellow, among others) adding their signatures to a petition stating they will not write for such a publication.
This discussion thread has diverse and knowledgeable opinions about whether research should have open or closed access. It’s a fascinating read and I highly recommend going through the entire thread to see what the ML community thinks about this topic.
Michael Jordan is a celebrated professor from Berkeley and in a recent talk he spoke at length about how we are miles away from reaching true intelligence in machines. It’s a sobering presentation and really makes one think about the topic.
This thread has generated more than 100 comments, with users weighing in with their opinions about where they perceive AI to be. What makes this a fascinating read is the depth of comments which some users have gone into. Go ahead, read it and participate in the still active discussion.
This looks like a reasonably straightforward topic right? Wait till you dive into the thread. Data scientists and machine learning researchers from all over Europe and the USA are involved in an intense discussion about how the structure of ML is shaping in both continents, and what the salary figures look like. You will gain a lot of perspective about the architecture of ML projects and prospective salaries.
This thread was launched from Uber’s video on developing intrinsic dimension as a fundamental property of neural networks. If you have any doubts regarding the content presented in the video, the community has answered those questions in detail. The biggest poisitve seems to be that people love that a research paper was turned into a video, which makes it easier to understand the research.
Have you used any of the GitHub libraries before? And what’s your take on the Reddit discussions? If you have any feedback or suggestions, or need clarification on anything, get involved in the comments section below!