Python is increasingly gaining popularity among machine learning and data science communities across the world – and for the right reasons. It probably has the most developed ecosystem for deep learning, a collection of awesome libraries like pandas and scikit learn and an awesome community.
PyData is a community for developers and users for open source data tools. They also conduct several conferences and I came across amazing talks from PyData Amsterdam 2017 recently. Even though I wanted to be part of the conference, it was difficult for me to travel. Thankfully, PyData released all the videos on their YouTube channel.
The spread of the talks is amazing. Be it a novice, intermediate or an expert python user, PyData had something for everyone. To help the community, I have summarized the best talks from data science perspective in this article. For your convenience, I’ve also added a short summary of each video. We have the videos segregated in 4 categories – Deep Learning, Big Data, Data Science and Natural Language Processing.
Consume as you want, learn, like and share!
Deep Learning talks
1) Title : Deep Learning at Booking.com
Speaker : Emrah Tasli, Stas Girkin
Duration : 00:32:38 hrs
This talk intrigued me as soon as I read the title. I have always been a booking.com user. To see how they use deep learning to enhance user experience was a treat.
Watch this video to get a practical overview of how deep learning is used in the industry. It focuses mainly on the applications of deep learning at booking.com . This covers applications like analyzing image content, analyzing text, understanding speech and building recommendation systems.
The speakers then discuss how these techniques are applied at scale, and the tools used by booking.com to handle this scale.
2) Title : Using deep learning in natural language processing
Speaker : Rob Romijnders
Duration : 00:25:42 hrs
Understanding language nuances is a difficult problem to solve – but deep learning holds our hope. This video is a must watch for people who want to use deep learning in natural language processing. It explains the motivation for using deep learning for NLP applications such as machine translation. It further explains how RNN works and how they are implemented.
Lastly, Rob presents tips for increasing performance of these systems.
3) Title: Creativity and AI: Deep Neural Nets “Going Wild”
Speaker: Roelof Pieters
Duration: 00:33:45 hrs
Roelof talks about basics of deep learning with the explosion of research and experiments that deal with creativity and artificial intelligence.
He also talks about the wonderful trippy world of neural nets “going wild” and shows some of the exciting possibilities new technologies have to offer to make us all more creative. Like, dancing moves, freestyle raps, impressionist paintings and showed some of the exciting possibilities new technologies offer for creative use and explorations of human-machine interaction where the main theorem is “augmentation, not automation”.
He particularly focuses on “generative” models, and shows the python fanatics how to make your move with a particular form of Deep Neural Nets, to then finish with an “experiment”.
4) Title : Neural Networks for Recommender Systems
Speaker : Maciej kula
Duration : 00:32:55 hrs
Neural Networks are constantly replacing every other machine learning algorithm in real life systems and recommendation systems are no exception.
In this tutorial, the speaker starts from the advantages of neural networks in recommender systems and goes through various machine learning models used in recommender systems including Factorization models, Bilinear Neural Networks and sampled loss functions. If you are aspiring to make an efficient recommender system, this video is worth watching.
5) Title : Training a TensorFlow model to detect lung nodules on CT scans
Speaker : Mark Jan Harte, Gerben van Veenendaal
Duration : 00:25:53 hrs
If you’re a philanthropist, this video is a must watch for you. It shows one of the numerous breakthrough applications of deep learning – to automate the detection of abnormality in medical imaging.
The speakers describe the pipeline devised for automating the process. They explain in detail what are the challenges they faced while approaching the problem, what kind of hardware they utilize and then technically define their pipeline end-to-end. Its inspiring to see what kind of advancements deep learning can achieve.
6) Title : Siamese LSTM in Keras: Learning Character-Based Phrase
Speaker : Carsten van Weelden, Beata Nyari
Duration : 00:29:42 hrs
In this talk, the speakers explains how they solved the problem of classifying job titles into a job ontology with more than 5000 different classes. They do this by learning a character-based representation of job titles with a B-LSTM encoder trained as a Siamese network. You will learn about the methods in theory and how these can be implemented with the Keras deep learning library.
7) Title : Deep learning for time series made easy
Speaker : Dafne van Kuppevelt
Duration : 00:22:47 hrs
Deep learning is a state of the art method for many tasks, such as image classification and object detection. For researchers that have time series data, but are not an expert on deep learning, the barrier can be high to start using deep learning.
In this talk, the speaker explores how machine learning novices can use deep learning for time series classification. The speaker then explains mcfly, an open source python library, to help machine learning novices explore the value of deep learning for time series data.
8) Title : Deep Reinforcement Learning: theory, intuition, code
Speaker : Maxim Lapan
Duration : 00:28:27 hrs
In this talk the speaker gives a practical introduction into deep reinforcement learning methods, used to solve complex applications like control problems in robotics, play Atari games, self-driving car control and lots more. Deep Reinforcement Learning is a very hot topic, successfully applied in lots of areas which require planning of actions in complex, noisy and partially-observed environments. Concrete examples vary from playing arcade games, navigating websites, helicopter, quadrocopter and car control, protein folding and lots of others.
9) Title: Different Strategies of Scaling H2O Machine Learning on Apache Spark
Speaker: Jakub Hava
Duration: 00:32:12 hrs
H2O is becoming increasingly popular when handling big data. In this video, Jakub has discussed about basic overview of machine learning on top of H2O and Spark. He explains different ways to scale your tasks on top of these technologies like data munging in spark and model building in H2O or using a mix of both for data munging and model building.
Sparkling Water integrates H2O with the capabilities of Apache Spark. It also allows us to leverage H2O’s machine learning algorithms with Apache Spark applications via Scala, Python, R or H2O’s Flow GUI which makes Sparkling Water a great enterprise solution.
This video introduces the basic architecture of Sparkling Water, going over different scaling strategies and explains the pros and cons of each solution. It finishes with a live demo demonstrating the approaches and should give you a real-life experience of configuring and running Sparkling Water for your use case(s).
10) Title: A billion stars in the Jupyter Notebook
Speaker: Maarten Breddels
Duration: 00:30:58 hrs
Ever tried to visualise high dimensional data and didn’t get good results? Well, this is the right place for you. In this video, Maarten talks about two Python packages: “Vaex” and “ipyvolume”.
“Vaex” enables calculating statistics for a billion samples per second and “ipyvolume” enables to interactively visualise and explore these billion sample tables for high dimensional spaces. He shows the methods to visualize and explore large datasets (>1 billion) instead of using cluttered scatter plots. “ipyvolume” helps us to visualize higher dimensional data in the notebook interactively which can render 3d volumes and up to a million glyphs (scatter plots and quiver) in the (Jupyter) notebook as a widget.
“Vaex” and “ipyvolume” can be used together to explore and visualize any large tabular data set, or separately to calculate statistics, and render 3d plots in the notebook and outside.
11) Title: Finding Needles in a Growing Haystack
Speaker: Stephen Helms
Duration: 00:31:02 hrs
In this video, Stephen Helms discusses about the architectural designs for big data. As the machines get more and more advanced, we’ll collect more and more data. With high amounts of data, it becomes a challenge to efficiently summarise the data and present relevant data to the users.
Stephan addresses this challenge and tries to discuss the architectural designs and implementations which can be scaled to large amounts of data. He uses Bayesian statistics to build the automated reporting system. If you’re interested to know more about scaling your analysis to production, you would find this video very interesting.
12) Title: Survival analysis for conversion rates
Speaker: Tristan Boudreault
Duration: 00:22:01 hrs
Do you buy a product after the free trial ends ? As a product manager, your job might be on the line depending on how many users subscribe to your product after their free trial ends?
In this video, Tristan Boudreault tries to estimate as to how many customers would be ready to pay after the trail expires. In business context, he tries to analyse how successful a website is, in converting its trail users into paid ones. When we actually look at the data we realise that people are not as impulsive as we think they are. They spend money after being comfortable with the product.
He also discusses that sometimes it might be really tough to actually estimate the conversion by just looking at the numbers especially in cases when the company is growing exponentially. He has taken really interesting examples and it’s a great video if you’re looking for applying analytics to your offering on the web.
13) Title : Risk Analysis
Speaker : Rogier van der Geer
Duration – 00:31:20 hrs
Ever thought that data science can be used to win a game? Well here is a video illustrating how to play risk using python. In this video Rogier van der Geer explained how python based simulation is used to train genetic algorithm to play the game.
The video also focusses on designing and implementation of these algorithms in a simplified way that can be optimised for winning the game. A must watch for Data Science enthusiast as it shows how Data Science can be used to win a game!
14) Title: Python vs Orangutan
Speaker: Dirk Gorissen
Duration: 00:35:35 hrs
This is probably the most interesting talk and a Keynote session by Dirk Gorissen. He addresses the problem of locating the orang-utans in the jungle. So, orang-utans are one of the rare forms of apes which need to be located and protected in the jungles. To locate them they have used radio waves and identify the orang-utans when the result is unique/anomalous.
This video discusses this problem using a drone based tracking system. He shows beautifully how we can solve this problem analysing the data we receive from each signal.
15) Title : Diagnosing Machine Learning Models
Speaker : Lucas Javier Bernardi
Duration – 00:39:00 hrs
A Machine Learning model is never perfect. If it completely fails, it must be fixed. If it performs well, we want to improve it. In this talk Lucas Javier Bernardi discuss about various techniques and tools needed to diagnose machine learning algorithms and models.
The video explains how simple techniques and statistics can be used to improve a model and is a must watch for an aspiring data scientist.
16) Title : Data Science in Internet of Things using Python and Spark
Speaker : Rafael Schultze Kraft
Duration : 00:32:01 hrs
Time series forecasting is one of the most interesting application of Data Analysis. In this video Rafael Schultze Kraft discussed about predicting time series forecast using Python and Spark .
The videos explains how to build machine learning models using AWS and python on data from sensor after suitable preprocessing which can be further used to predict significant information regarding time series data.
17) Title : Bayesian optimization with Scikit-Optimize
Speaker : Gilles Louppe
Duration : 00:28:53 hrs
Optimization has always been an integral part of problem solving. Bayesian Optimization is a principled approach to optimize an expensive function. In this tutorial, Gilles Louppe demonstrates the use of Bayesian optimization algorithm using a newly built package Scikit-optimize which provides an easy-to-use set of tools to serve the purpose. Here you’ll understand the steps involved in Bayesian optimization and how to implement it in python, with an interesting analogy with brewing good quality coffee.
18) Title : Applied Data Science
Speaker : Giovanni Lanzani
Duration : 00:35:13 hrs
With the data science and machine learning industry growing at a fast pace and all the companies incorporating these self-learning tools in their businesses, we always strive for developing the best models with the highest achievable accuracy. But this is not always in the best interest of the business, where a combination of practicality with accuracy will deliver a more acceptable end product. In this talk, Giovanni Lanzani discusses about the same while phrasing real life examples from big companies like Amazon and Netflix. Being a data science aspirant one could consider these important details to better optimize the delivered product.
19) Title : Successfully applying Bayesian statistics to A/B testing in your business
Speaker : Ruben Mak
Duration : 00:38:51 hrs
A/B testing in business is a very good way to test which of your variants of product is performing the best and in turn improve the business outcome. In this tutorial, Ruben Mak discusses about applying Bayesian Statistics to improve A/B testing in your business. Shortly discussing the frequentist calculations of an A/B test and common problems in it, he uses this to explain Bayesian Statistics and more specifically hierarchical Bayes to further reduce the probability of making errors in multiple comparisons. The video also focuses on one of the most important aspects from a business perspective: when to stop an insignificant test.
20) Title: Deploying Python Models to Production
Speaker : Niels Zeilemaker
Duration : 00:31:45 hrs
Developing a model is actually half of the battle and you still need to put it in the production. This tutorial is all about doing so. Starting from Gitlab, the speaker covers the tools necessary for deployment of a machine learning model such as Jenkins, Docker, Kuebernetes, json logger and DTAP and goes through why and how of every tool along with codes wherever needed. I would suggest you to take your time and go through every slide of the talk to be a better data science practitioner.
Natural Language Processing
21) Title: Pythonic Metal
Speaker: Iain Barr
Duration : 00:26:55 hrs
Basics of NLP are always a challenge to conquer. This tutorial discusses the basic concepts of Natural Language Processing like vectorization of words, bag of words, word count as binomial frequency and deriving intelligence from it with the help of an example data set of 200,000 songs. Go ahead and take a look on it if you aspire to learn Natural Language Processing. Keep in mind that this video is a bit demanding, and you should have prior knowledge of basics of data science.
22) Title: Simulate your language
Speaker: John Paton
Duration: 00:27:36 hrs
I was living in another state for almost 6 years and didn’t know the native language of the place. I always used to wonder if they hear my words similar to what I think of theirs. John Paten has answered my question here. He tries to demonstrate how our language looks to people who actually don’t speak it. He makes simple Markov Models for simulating any language in python. He shows various visualisations to understand the similarity and differences between various languages. There are very simple yet interesting insights about different languages regarding the most commonly used letters or whether a language uses long words or shorter ones to express the feelings. After this video you shall be able to understand the working of Markov models and would be able to understand and analyse languages using your models.
Just watching these videos wouldn’t make you a better analyst. You need to practice too. For best results, you can take notes from the video. This will help you to quickly refer the topic at a later point in time.
While watching these videos, there were several moments when I felt, there are lot many things in Python which I am yet to explore. Once again I would like to thank python community for being so generous, helpful and always being helpful in time of need. If you would like to see more such videos from Pydata, you can check out their Youtube channel.
Did you find this list of tutorials helpful? Which tutorial or talk did you like the most? Share your experience/ suggestion in the comments below.