Top Highlights from the Amazing Machine Learning Tutorials Presented at NeurIPS (NIPS) 2018

Aishwarya Singh 19 Jul, 2020

10 min read

Introduction

NeurIPS (formerly called NIPS – Neural Information Processing Systems) is one of the premier machine learning conferences in the world. Researchers from across the globe present their latest projects in this field, but getting past the review screening? Not so easy. Thousands of papers are submitted every year out of which only a handful make the final conference.

The audience tickets for NeurIPS 2018 sold out within 12 minutes of the portal being opened! That might give you an inkling of how popular this annual conference is. For those who couldn’t be there – we are thrilled to present a quick summary of the best tutorials from NeurIPS 2018!

This year’s edition was held in Montreal, Canada between 2nd to 8th December. There were a variety of topics being showcased – from fairness and transparency in AI to visualizing deep learning models. You can check out the full schedule here.

There are hours and hours of videos, so our team went through all of them to bring you the best in the form of this article.

Note: We have embedded the videos for most sessions as well. A couple of videos are not being embedded due to some technical issue with FB’s video platform, and we have provided their direct links. While the summary is a good starting point, we encourage everyone to watch the videos as well – this is a great chance to learn from the top minds in this field.

Automatic Machine Learning
Common Pitfalls for Studying the Human Side of Machine Learning
Statistical Learning Theory: a Hitchhiker’s Guide
Unsupervised Deep Learning
Adversarial Robustness, Theory and Practice
Visualization for Machine Learning
Counterfactual Inference
Scalable Bayesian Inference
Negative Dependence, Stable Polynomials and All That

Automatic Machine Learning

Speakers: Frank Hutter and Joaquin Vanschoren

Tutorial summary

Building an end-to-end machine learning model involves a number of steps, such as preprocessing data, creating features, selecting model, and tuning the hyperparameters. Automatic Machine Learning, or AutoML, aims to automate these processes – this tutorial covers methods underlying the state-of-the-art in AutoML. Quite a relevant topic in today’s environment.

Frank Hutter kicked off the tutorial by discussing the various applications of deep learning and an expert’s role in building a successful model. This can potentially be replaced by an AutoML service that tries to learn the features, architecture and parameters to use based on the raw data that we provide. Followed by this basic introduction to AutoML, Frank spoke about the types of hyperparameters and modern approaches to Hyperparameter Optimisation. This is broadly divided into three sub-topics:

AutoML as hyperparameter optimization
Blackbox optimization: Discusses approaches for blackbox optimization like grid search, random search and Bayesian optimization.
Beyond Blackbox optimization: Covers three main approaches – hyperparameter gradient descent, extrapolation of learning curves and multi-fidelity optimization. Meta-learning is also a part of this aspect

The next topic Frank covered was about Neural Architecture, which is again divided into three parts – Search Space Design, Blackbox optimization and Beyond Blackbox optimization.

Search Space Design: Includes basic neural architecture search spaces such as chain structured search spaces and cell structured search spaces.
Blackbox optimization for neural architecture search (NAS) method: Frank covers NAS with reinforcement learning and Bayesian optimization as a part of this topic
Beyond Blackbox optimization: Discussion on the four main approaches weight inheritance and network morphisms, weight sharing and one-shot model, multi-fidelity optimization and meta-learning.

After a short Q&A session with Frank, Joaquin Vanschoren took over for the second half of the tutorial. His focus was mainly on Meta-Learning. He spoke about various approaches, configuration space design, surrogate model transfer and warm-started multi-task learning. Joaquin further discussed the Learning Pipeline followed by transfer learning and transfer features. He also spent some time discussing topics like gradient descent and LSTM meta-learner.

Common Pitfalls for Studying the Human Side of Machine Learning

Speakers: Deirdre Mulligan, Nitin Kohli, Joshua A. Kroll

Tutorial summary:

Machine learning is being used in almost every domain in the industry and researchers all over the world are examining how it can affect people and society. Ethics, essentially, should be at the heart of every ML project. The main idea behind this tutorial was to put forward some common misconceptions machine learning researchers and practitioners hold when thinking about certain topics.

This is a video we implore everyone to watch!

Some of the terms, like fairness, accountability, transparency and interpretability, are often reused to represent different meanings which may cause an unnecessary misunderstanding. This tutorial examined how the same words can be used to refer to different ideas. The presenters also showcased a few case studies where these learnings are being applied to ML problems.

The session started with focusing on the necessity of having certain definitions for the terms we use and how these terms carry different meanings for different people. We were introduced to the term ‘sociotechnical’. To explain the concept better, Nitin Kohli took the term ‘Fairness’ and showcased how it can come across differently for statisticians, computer scientists, or lawyers.

The next example, quite naturally, was of the word ‘Transparency’. The presenters picked up some very common examples to differentiate the meaning of the word for someone working in the government sector, to a machine learning engineer. Post this, Nitin described the word ‘Explanation’ and its various types with suitable instances.

They also spoke about the terms ‘accountability’ and ‘interpretability’ during the session and the Q&A that followed was pretty informative as well.

Statistical Learning Theory – a Hitchhiker’s Guide

Speakers: John Shawe-Taylor, Omar Rivasplata

Watch the video for this tutorial here.

Tutorial summary:

John Shawe-Taylor initiated the tutorial by giving an introduction to statistical learning theory (SLT) followed by some basic definitions and notations for terms used quite frequently, such as generalization gap, upper bound, etc.

Both speakers provided a broad outline of the session where they listed down some important topics:

First generation of SLT
- Worst-case uniform bounds
- Vapnik Chervonenkis characterization
Second generation SLT
- Hypothesis dependent complexity
- SRM Margin PAC-Bayes framework
Next generation SLT?

After familiarizing the audience with important terminologies, Omar Rivasplata took over the baton by discussing about First Generation SLT. He started with talking about the building blocks of a single function and then explains the finite function class and infinite function class. In the next few slides, Omar discussed the VM bounds along with the limitations to the VM framework.

Once the audience had been familiarized with the first generation of SLT in the first half of the talk, John gave an overview of what comes after that – the second generation of SLT. He elaborated on the different ways to make the bound function dependent and the techniques that can be used for detecting a benign distribution.

We also saw the Three Proof Techniques – Covering numbers, Rademacher Complexity, and PAC Bayes Analysis. He compared the PAC-Bayes bounds with Bayesian Learning. This part of the talk is really interesting – do watch the video to gain a deeper insight into it.

The last section of the tutorial is a discussion over the Next Generation SLT, where the speakers talks about Performance of neural networks and stability. Throughout the tutorial, the speakers explain all the concepts using plots and mathematical equations which makes the topics crystal clear.

Unsupervised Deep Learning

Speakers: Alex Graves and Marc’Aurelio Ranzato

Tutorial summary

A topic most of you will be curious to explore more! This tutorial is divided into two parts:

Part 1 is taught by Alex Graves
Part 2 by Marc’Aurelio Ranzato.

In Part 1, Alex explained why we need unsupervised learning in the first place. Why can’t we just provide the true labels for training the model? There are mainly three reasons for that:

Targets can be difficult to obtain
Unsupervised learning feels more human
We want rapid generalisation on new tasks and situations

Targets in supervised learning contain very less information as compared to the input data. Using supervised learning, we are bounding the model to learn only a few bits of information. Unsupervised learning on the other hand, gives us an essentially unlimited supply of information to learn. So instead of learning the data points, the model learns the dataset.

Unsupervised learning gives us more of a signal to learn from, but the learning objective is not entirely clear. Autoregressive neural networks can be used for density modelling which help to learn information from the data. Methods such as auto-encoding and predictive coding can yield useful latent representations.

In Part 2, Marc discussed various applications of unsupervised learning which are based on other frameworks and principles. He explained how to learn representations and samples and how to map between two domains. Some of the tips for learning representations are:

Always look at the data before designing the model
PCA and k-means are very often a strong baseline

He also mentioned how to extract features in NLP using unsupervised learning. Some of the applications of learning how to map between two domains are:

Making analogies in vision
Leverage lots of unlabeled data in machine translation
An AI agent has to be able to perform analogies to quickly adapt to a new environment so learning this mapping is helpful

Unsupervised learning has tons of sub-areas like feature learning, learning to align domains, learning to generate samples, etc. The biggest challenges with unsupervised learning are:

Which metric to choose and what should be the defined task?
Generality and efficiency of current algorithms
Integrating unsupervised learning with other learning components

Adversarial Robustness, Theory and Practice

Speakers: Zico Kolter and Aleksander Mądry

Tutorial summary

In the tutorial, both researchers spoke about how machine learning predictions are mostly accurate, but at the same time, brittle as well. Intrigued? Adding just a little noise to the data can change the predictions drastically, resulting in a drop in performance. Trying data augmentation also does not help much in improving the performance. Some of the problems that the brittleness of machine learning can cause are:

Security
Safety
ML alignment

Zico and Aleksander proposed three commandments in order to make our machine learning model more secure:

Don’t train on the data which you don’t trust as it may lead to data poisoning
Never let anyone use your model or observe its output unless you completely trust them
Don’t fully trust the predictions of your model because of adversarial examples

They talked about adversarial examples and verification, and how to train adversarially robust models. Zico further propounded on whether robust deep networks overfit or not. Even for training adversarial robust models, more data is required – this is a known fact. Data Augmentation can be used to make the model robust. Adversarial training is also an ultimate version of data augmentation as we train on the most confusing version of the given training set.

Some of the keypoints to make a model robust are:

In standard training, all correlation is a good correlation
But, if we want robustness, weakly correlated features must be avoided

Finally, to summarize adversarial robustness:

Optimization during training is more difficult and the model needs to be larger
More training data is required
The standard accuracy also might decrease while using adversarially robust models

Apart from this, the advantages of using adversarial robust models are massive. The model becomes more semantically meaningful. We will be able to rely on it far more. And it leads to machine learning that is not only safe and secure, but also better. Sounds like a good bet to us!

Visualization for Machine Learning

Speakers: Fernanda Viégas and Martin Wattenberg

Watch the full video for the session here

Tutorial summary

Visualization is a topic all of us can relate to at some level. Who among us hasn’t done a thorough EDA before?

Fernanda Viégas and Martin Wattenburg covered one of the most interesting and fundamental topics of machine learning – visualization. They first spoke about what data visualization is, how it works and what are some of the best practices for it. The talk then focused on how visualization has been applied to machine learning till date. A special case of high dimensional data has also been covered in this tutorial.

Data visualization is good for almost every field and some of its applications include:

Data exploration
Scientific insights
Communication
Education

Using colors for visualization makes it more interpretable and even faster. Visualization makes calculations easier and less tedious (and who doesn’t appreciate that?!). Some of the examples where it helps in calculation are:

Average calculation and comparison
Weighted average

The tutorial also dove into the interpretability and model inspection facets of a ML project. Visualizing different layers of convolutional neural networks (CNNs) helps us to understand how it classifies images (this also helps in case the model is not performing well).

We can interpret the model layer by layer and finally conclude where it is going wrong. They recommend using Jupyter notebooks for visualization which have libraries like matplotlib and plotly which have pre-built codes for most visualizations.

Scalable Bayesian Inference

Speaker: David Dunson

Tutorial summary

Bayesian learning as a topic has fascinated us for a long time.

The objective of this session was to motivate people to work more on Bayesian methods as these methods offer an attractive general approach for modeling complex data. David Dunson gave an overview of the state-of-the-art approaches for analyzing huge datasets using Bayesian statistical methods.

David explained how the Markov Chain Monte Carlo (MCMC) algorithm is becoming more and more scalable and faster thanks to the emerging rich and practical literature on the subject. Apart from that, he put quite an emphasis on tweaking the Bayesian paradigm to be more robust with respect to Big Data and scaling of Bayes to high-dimensional data (no. of features > no. of samples), which in itself is quite a hot topic.

If you are interested in Bayesian statistics, then this is a must-watch video, and has the following key takeaways:

Bayes is scalable
In big data and high-dimensionality problems, we have to tweak and modify Bayesian algorithms
We have to carefully explore how to exploit parallel processing for Bayes methods & accurate approximations to reduce bottlenecks
These Bayes methods can have improved computational performance & robustness

Negative Dependence, Stable Polynomials and All That

Speakers: Survit Sra and Stefanie Jegalka

Tutorial summary

This tutorial gives an introduction to the topic: the theory of negative dependence. This can impact all aspects of machine learning, including both supervised and unsupervised learning. It is a rich mathematical toolbox which aids in tasks like anomaly detection, information maximization, experimental design, validation of black-box systems, architecture learning, fast MCMC sampling, and much more.

The speakers have highlighted the rich variety of mathematical ideas behind the theory of negative dependence. In addition, the following topics have been covered:

Strongly Rayleigh (SR) Measures: Determinental Point Processes, Volume Sampling, Dual Volume Sampling
Applications in ML that benefit from Negative Dependence: Active learning, Interactive learning, Recommender Systems, Adversarial models, etc.

End Notes

That was quite an intense collection! This year’s conference was bigger than ever before, with more papers, a bigger venue and a far more widespread audience from around the world. We ended up learning quite a lot while reviewing these videos – so we recommend you do the same. 🙂

Which was your favorite session from NeurIPS 2018? Connect with us in the comments section below and feel free to ask any questions you might have on the topics that were covered.

Aishwarya Singh 19 Jul, 2020

An avid reader and blogger who loves exploring the endless world of data science and artificial intelligence. Fascinated by the limitless applications of ML and AI; eager to learn and discover the depths of data science.

Artificial Intelligence Deep Learning