The Best Machine Learning GitHub Repositories & Reddit Threads from July 2018
Did you ever imagine you could become an artist without knowing how to paint or even hold a paintbrush? This is what you can do now, thanks to computer vision techniques. And what’s even better, the ML community is so awesome that the code to do this has been open sourced! This is the power of GitHub and why I encourage all data scientists, aspiring or established, to use it regularly.
GitHub has been at the heart of open source data science and machine learning. Whether you are contributing to an existing repository or building one of your own, you are sure to gain a ton of knowledge.
There are some really cool repositories below – deep learning and GANs specific, natural language processing (NLP) related text matching, and computer vision (as mentioned above) to extend and re-imagine existing images. There’s something here for everyone!
Coming to Reddit, we have selected a mix of deep learning and artificial intelligence related discussions. These will help you assess and understand the current state of certain technologies in the industry and where we might be headed in the near future.
You can check out the top GitHub repositories and top Reddit discussions (from April onwards) for the first 6 months of the year below:
This is one of the coolest repositories I have covered in this series. ‘Inpainting’ has been a trending concept recently but this technique, designed by a couple of researchers from Stanford, does the opposite. ‘Outpainting’ extends the use of GANs for inpainting to estimate and imagine what the existing image might look like beyond what can be seen. Then the algorithm expands the image beyond it’s existing boundaries. The results, as you can see in the image above, are outstanding.
This repository is an open source implementation using Keras in Python. You can either build a model from scratch or use the one provided by this repository’s author. Either way, try it out!
Be sure to check out Analytics Vidhya’s article on this approach here.
This repository does what it says – it’s a TensorFlow implementation of various text classification models. What I liked about this repository is that it contains links to each model that has been discussed. This provides an understanding of what you are doing, which is extremely helpful. The models implemented here are:
- Word-level CNN
- Character-level CNN
- Very Deep CNN
- Word-level Bidirectional RNN
- Attention-Based Bidirectional RNN
While not strictly a library created last month, this repository got a big update recently. MatchZoo is basically a toolkit for text matching. It has been created in order to design, compare and share the various deep text matching models. Potential tasks MatchZoo can do are document retrieval, conversational response ranking, question answering, and paraphrase identification, among others.
Some of the deep matching methods out there are DRMM, MatchPyramid, MV-LSTM, aNMM, DUET, etc. Check out the repository to get details on how to install and take advantage of this extremely useful library.
Does the above ensemble of faces get you excited about this repository? The image inside the green border is the original one, the rest of the images use GANimation to anatomically change the facial expressions of the subject(s). This is a slightly complex approach but is something you must explore if you are interested in deep learning.
The authors have provided everything you need to get started – a beginner’s guide, prerequisites, data preparation resources, and of course, the Python code. What are you waiting for? Dig in!
This excellent repository contains Python codes for various experiments conducted as part of the ‘here‘ paper. This was presented at the International Conference on Machine Learning 2018 last month. It’s a fascinating case study for anybody interested in deep learning and especially GANs.
Why I have included this repository is because it gives you a really good idea of the level of research and thinking that goes into papers that are accepted and presented at top class ML conferences. You can also view the best papers from ICML 2018 here.
If you are a newcomer to deep learning, this instantly becomes a must-read thread for you. Plenty of DL experts have provided their views (and a plethora of links) on recently published papers that you should read and implement. This reinforces what you’ve learned and has the additional advantage of keeping you up-to-date with a breakthrough technique.
If you are a deep learning veteran, this will either refresh your concepts or teach you about all that’s happening in this diverse field. You can never get enough knowledge so I encourage you to check out all the resources provided. You should also read through all the opinions provided by other data scientists which will add to your own perspective.
The title of this thread is enough to grab a data scientist’s attention. This discussion spawned from a Twitter debate on how science is being used by the big technology organizations. While the debate started from a pessimistic viewpoint, it jumped to more positive or assertive views from people who have worked with these companies.
You will not only learn how science is defined and used at Google Brain, et all, but also what fellow data science people think about the current state of science in the industry.
If you want to get into the research side of machine learning, you need to know the theory behind how things work. Thin includes topics like core mathematics, probability, etc. This thread lists down some of the more advanced books on various machine learning concepts.
There are tons and tons of suggestions (almost a 100 comments!) in there along with links so you cannot complain about lack of resources. From advanced ML to introduction to reinforcement learning, this thread is a goldmine of top notch resources.
This has been an ongoing discussions since decades, and has gained even more prominence with the recent interest in ML and AI. The concern is real despite experts doing their best to allay fears. Go through this thread end-to-end – it contains opinions from AI enthusiasts and experts about how they see AI impacting jobs in different countries.
There are also plenty of statistics and links shared which help in gauging where AI is headed. Make sure you contribute with your valuable opinion to the overall discussion as well. The more you put yourself out there, the more confident you will be in your data science skin.
Data visualization is a critical aspect of any machine learning project. But it has it’s own standalone applications as well, like dashboards, reports, etc. Business intelligence is a thriving field these days and as more folks get into it, they need to be aware of some of the most common mistakes people make. The given image is a great illustration of this.
One of the more fun but important threads you will come across in your data science journey. You don’t need to religiously adhere to each point that has been showcased, but it’s good to have an overall idea of how leaders in the field think.
This month’s article was geared more towards deep learning but I have tried to maintain balance by sharing some beginner friendly Reddit discussions. I repeat again – please try to contribute to both GitHub repositories and Reddit discussions because these will help you immensely in your career. The more you read and share, the better your own knowledge becomes.
If you know of any other links that the community should know about, go ahead and share them with us in the comments section below.