22 must watch talks on Python for Deep Learning, Machine Learning & Data Science (from PyData 2017, Amsterdam)

Sunil Ray Last Updated : 25 Jun, 2019

10 min read

Introduction

Python is increasingly gaining popularity among machine learning and data science communities across the world – and for the right reasons. It probably has the most developed ecosystem for deep learning, a collection of awesome libraries like pandas and scikit learn and an awesome community.

PyData is a community for developers and users for open source data tools. They also conduct several conferences and I came across amazing talks from PyData Amsterdam 2017 recently. Even though I wanted to be part of the conference, it was difficult for me to travel. Thankfully, PyData released all the videos on their YouTube channel.

The spread of the talks is amazing. Be it a novice, intermediate or an expert python user, PyData had something for everyone. To help the community, I have summarized the best talks from data science perspective in this article. For your convenience, I’ve also added a short summary of each video. We have the videos segregated in 4 categories – Deep Learning, Big Data, Data Science and Natural Language Processing.

Consume as you want, learn, like and share!

Deep Learning talks

1) Title : Deep Learning at Booking.com

Speaker : Emrah Tasli, Stas Girkin

Duration : 00:32:38 hrs

This talk intrigued me as soon as I read the title. I have always been a booking.com user. To see how they use deep learning to enhance user experience was a treat.

Watch this video to get a practical overview of how deep learning is used in the industry. It focuses mainly on the applications of deep learning at booking.com . This covers applications like analyzing image content, analyzing text, understanding speech and building recommendation systems.

The speakers then discuss how these techniques are applied at scale, and the tools used by booking.com to handle this scale.

2) Title : Using deep learning in natural language processing

Speaker : Rob Romijnders

Duration : 00:25:42 hrs

Understanding language nuances is a difficult problem to solve – but deep learning holds our hope. This video is a must watch for people who want to use deep learning in natural language processing. It explains the motivation for using deep learning for NLP applications such as machine translation. It further explains how RNN works and how they are implemented.

Lastly, Rob presents tips for increasing performance of these systems.

3) Title: Creativity and AI: Deep Neural Nets “Going Wild”

Speaker: Roelof Pieters

Duration: 00:33:45 hrs

Roelof talks about basics of deep learning with the explosion of research and experiments that deal with creativity and artificial intelligence.

He also talks about the wonderful trippy world of neural nets “going wild” and shows some of the exciting possibilities new technologies have to offer to make us all more creative. Like, dancing moves, freestyle raps, impressionist paintings and showed some of the exciting possibilities new technologies offer for creative use and explorations of human-machine interaction where the main theorem is “augmentation, not automation”.

He particularly focuses on “generative” models, and shows the python fanatics how to make your move with a particular form of Deep Neural Nets, to then finish with an “experiment”.

4) Title : Neural Networks for Recommender Systems

Speaker : Maciej kula

Duration : 00:32:55 hrs

Neural Networks are constantly replacing every other machine learning algorithm in real life systems and recommendation systems are no exception.

In this tutorial, the speaker starts from the advantages of neural networks in recommender systems and goes through various machine learning models used in recommender systems including Factorization models, Bilinear Neural Networks and sampled loss functions. If you are aspiring to make an efficient recommender system, this video is worth watching.

5) Title : Training a TensorFlow model to detect lung nodules on CT scans

Speaker : Mark Jan Harte, Gerben van Veenendaal

Duration : 00:25:53 hrs

If you’re a philanthropist, this video is a must watch for you. It shows one of the numerous breakthrough applications of deep learning – to automate the detection of abnormality in medical imaging.

The speakers describe the pipeline devised for automating the process. They explain in detail what are the challenges they faced while approaching the problem, what kind of hardware they utilize and then technically define their pipeline end-to-end. Its inspiring to see what kind of advancements deep learning can achieve.

6) Title : Siamese LSTM in Keras: Learning Character-Based Phrase

Speaker : Carsten van Weelden, Beata Nyari

Duration : 00:29:42 hrs

In this talk, the speakers explains how they solved the problem of classifying job titles into a job ontology with more than 5000 different classes. They do this by learning a character-based representation of job titles with a B-LSTM encoder trained as a Siamese network. You will learn about the methods in theory and how these can be implemented with the Keras deep learning library.

7) Title : Deep learning for time series made easy

Speaker : Dafne van Kuppevelt

Duration : 00:22:47 hrs

Deep learning is a state of the art method for many tasks, such as image classification and object detection. For researchers that have time series data, but are not an expert on deep learning, the barrier can be high to start using deep learning.

In this talk, the speaker explores how machine learning novices can use deep learning for time series classification. The speaker then explains mcfly, an open source python library, to help machine learning novices explore the value of deep learning for time series data.

8) Title : Deep Reinforcement Learning: theory, intuition, code

Speaker : Maxim Lapan

Duration : 00:28:27 hrs

In this talk the speaker gives a practical introduction into deep reinforcement learning methods, used to solve complex applications like control problems in robotics, play Atari games, self-driving car control and lots more. Deep Reinforcement Learning is a very hot topic, successfully applied in lots of areas which require planning of actions in complex, noisy and partially-observed environments. Concrete examples vary from playing arcade games, navigating websites, helicopter, quadrocopter and car control, protein folding and lots of others.

Big Data

9) Title: Different Strategies of Scaling H2O Machine Learning on Apache Spark

Speaker: Jakub Hava

Duration: 00:32:12 hrs

H2O is becoming increasingly popular when handling big data. In this video, Jakub has discussed about basic overview of machine learning on top of H2O and Spark. He explains different ways to scale your tasks on top of these technologies like data munging in spark and model building in H2O or using a mix of both for data munging and model building.

Sparkling Water integrates H2O with the capabilities of Apache Spark. It also allows us to leverage H2O’s machine learning algorithms with Apache Spark applications via Scala, Python, R or H2O’s Flow GUI which makes Sparkling Water a great enterprise solution.

This video introduces the basic architecture of Sparkling Water, going over different scaling strategies and explains the pros and cons of each solution. It finishes with a live demo demonstrating the approaches and should give you a real-life experience of configuring and running Sparkling Water for your use case(s).

10) Title: A billion stars in the Jupyter Notebook

Speaker: Maarten Breddels

Duration: 00:30:58 hrs

Ever tried to visualise high dimensional data and didn’t get good results? Well, this is the right place for you. In this video, Maarten talks about two Python packages: “Vaex” and “ipyvolume”.

“Vaex” enables calculating statistics for a billion samples per second and “ipyvolume” enables to interactively visualise and explore these billion sample tables for high dimensional spaces. He shows the methods to visualize and explore large datasets (>1 billion) instead of using cluttered scatter plots. “ipyvolume” helps us to visualize higher dimensional data in the notebook interactively which can render 3d volumes and up to a million glyphs (scatter plots and quiver) in the (Jupyter) notebook as a widget.

“Vaex” and “ipyvolume” can be used together to explore and visualize any large tabular data set, or separately to calculate statistics, and render 3d plots in the notebook and outside.

11) Title: Finding Needles in a Growing Haystack

Speaker: Stephen Helms

Duration: 00:31:02 hrs

In this video, Stephen Helms discusses about the architectural designs for big data. As the machines get more and more advanced, we’ll collect more and more data. With high amounts of data, it becomes a challenge to efficiently summarise the data and present relevant data to the users.

Stephan addresses this challenge and tries to discuss the architectural designs and implementations which can be scaled to large amounts of data. He uses Bayesian statistics to build the automated reporting system. If you’re interested to know more about scaling your analysis to production, you would find this video very interesting.

Data Science

12) Title: Survival analysis for conversion rates

Speaker: Tristan Boudreault

Duration: 00:22:01 hrs

Do you buy a product after the free trial ends ? As a product manager, your job might be on the line depending on how many users subscribe to your product after their free trial ends?

In this video, Tristan Boudreault tries to estimate as to how many customers would be ready to pay after the trail expires. In business context, he tries to analyse how successful a website is, in converting its trail users into paid ones. When we actually look at the data we realise that people are not as impulsive as we think they are. They spend money after being comfortable with the product.

He also discusses that sometimes it might be really tough to actually estimate the conversion by just looking at the numbers especially in cases when the company is growing exponentially. He has taken really interesting examples and it’s a great video if you’re looking for applying analytics to your offering on the web.

13) Title : Risk Analysis

Speaker : Rogier van der Geer

Duration – 00:31:20 hrs

Ever thought that data science can be used to win a game? Well here is a video illustrating how to play risk using python. In this video Rogier van der Geer explained how python based simulation is used to train genetic algorithm to play the game.

The video also focusses on designing and implementation of these algorithms in a simplified way that can be optimised for winning the game. A must watch for Data Science enthusiast as it shows how Data Science can be used to win a game!

14) Title: Python vs Orangutan

Speaker: Dirk Gorissen

Duration: 00:35:35 hrs

This is probably the most interesting talk and a Keynote session by Dirk Gorissen. He addresses the problem of locating the orang-utans in the jungle. So, orang-utans are one of the rare forms of apes which need to be located and protected in the jungles. To locate them they have used radio waves and identify the orang-utans when the result is unique/anomalous.

This video discusses this problem using a drone based tracking system. He shows beautifully how we can solve this problem analysing the data we receive from each signal.

15) Title : Diagnosing Machine Learning Models

Speaker : Lucas Javier Bernardi

Duration – 00:39:00 hrs

A Machine Learning model is never perfect. If it completely fails, it must be fixed. If it performs well, we want to improve it. In this talk Lucas Javier Bernardi discuss about various techniques and tools needed to diagnose machine learning algorithms and models.

The video explains how simple techniques and statistics can be used to improve a model and is a must watch for an aspiring data scientist.

16) Title : Data Science in Internet of Things using Python and Spark

Speaker : Rafael Schultze Kraft

Duration : 00:32:01 hrs

Time series forecasting is one of the most interesting application of Data Analysis. In this video Rafael Schultze Kraft discussed about predicting time series forecast using Python and Spark .

The videos explains how to build machine learning models using AWS and python on data from sensor after suitable preprocessing which can be further used to predict significant information regarding time series data.

17) Title : Bayesian optimization with Scikit-Optimize

Speaker : Gilles Louppe

Duration : 00:28:53 hrs

Optimization has always been an integral part of problem solving. Bayesian Optimization is a principled approach to optimize an expensive function. In this tutorial, Gilles Louppe demonstrates the use of Bayesian optimization algorithm using a newly built package Scikit-optimize which provides an easy-to-use set of tools to serve the purpose. Here you’ll understand the steps involved in Bayesian optimization and how to implement it in python, with an interesting analogy with brewing good quality coffee.

18) Title : Applied Data Science

Speaker : Giovanni Lanzani

Duration : 00:35:13 hrs

With the data science and machine learning industry growing at a fast pace and all the companies incorporating these self-learning tools in their businesses, we always strive for developing the best models with the highest achievable accuracy. But this is not always in the best interest of the business, where a combination of practicality with accuracy will deliver a more acceptable end product. In this talk, Giovanni Lanzani discusses about the same while phrasing real life examples from big companies like Amazon and Netflix. Being a data science aspirant one could consider these important details to better optimize the delivered product.

19) Title : Successfully applying Bayesian statistics to A/B testing in your business

Speaker : Ruben Mak

Duration : 00:38:51 hrs

A/B testing in business is a very good way to test which of your variants of product is performing the best and in turn improve the business outcome. In this tutorial, Ruben Mak discusses about applying Bayesian Statistics to improve A/B testing in your business. Shortly discussing the frequentist calculations of an A/B test and common problems in it, he uses this to explain Bayesian Statistics and more specifically hierarchical Bayes to further reduce the probability of making errors in multiple comparisons. The video also focuses on one of the most important aspects from a business perspective: when to stop an insignificant test.

20) Title: Deploying Python Models to Production

Speaker : Niels Zeilemaker

Duration : 00:31:45 hrs

Developing a model is actually half of the battle and you still need to put it in the production. This tutorial is all about doing so. Starting from Gitlab, the speaker covers the tools necessary for deployment of a machine learning model such as Jenkins, Docker, Kuebernetes, json logger and DTAP and goes through why and how of every tool along with codes wherever needed. I would suggest you to take your time and go through every slide of the talk to be a better data science practitioner.

Natural Language Processing

21) Title: Pythonic Metal

Speaker: Iain Barr

Duration : 00:26:55 hrs

Basics of NLP are always a challenge to conquer. This tutorial discusses the basic concepts of Natural Language Processing like vectorization of words, bag of words, word count as binomial frequency and deriving intelligence from it with the help of an example data set of 200,000 songs. Go ahead and take a look on it if you aspire to learn Natural Language Processing. Keep in mind that this video is a bit demanding, and you should have prior knowledge of basics of data science.

22) Title: Simulate your language

Speaker: John Paton

Duration: 00:27:36 hrs

I was living in another state for almost 6 years and didn’t know the native language of the place. I always used to wonder if they hear my words similar to what I think of theirs. John Paten has answered my question here. He tries to demonstrate how our language looks to people who actually don’t speak it. He makes simple Markov Models for simulating any language in python. He shows various visualisations to understand the similarity and differences between various languages. There are very simple yet interesting insights about different languages regarding the most commonly used letters or whether a language uses long words or shorter ones to express the feelings. After this video you shall be able to understand the working of Markov models and would be able to understand and analyse languages using your models.

End Notes

Just watching these videos wouldn’t make you a better analyst. You need to practice too. For best results, you can take notes from the video. This will help you to quickly refer the topic at a later point in time.

While watching these videos, there were several moments when I felt, there are lot many things in Python which I am yet to explore. Once again I would like to thank python community for being so generous, helpful and always being helpful in time of need. If you would like to see more such videos from Pydata, you can check out their Youtube channel.

Did you find this list of tutorials helpful? Which tutorial or talk did you like the most? Share your experience/ suggestion in the comments below.

Sunil Ray

Sunil Ray is Chief Content Officer at Analytics Vidhya, India's largest Analytics community. I am deeply passionate about understanding and explaining concepts from first principles. In my current role, I am responsible for creating top notch content for Analytics Vidhya including its courses, conferences, blogs and Competitions.

I thrive in fast paced environment and love building and scaling products which unleash huge value for customers using data and technology. Over the last 6 years, I have built the content team and created multiple data products at Analytics Vidhya.

Prior to Analytics Vidhya, I have 7+ years of experience working with several insurance companies like Max Life, Max Bupa, Birla Sun Life & Aviva Life Insurance in different data roles.

Industry exposure: Insurance, and EdTech

Major capabilities: Content Development, Product Management, Analytics, Growth Strategy.

Deep Learning Intermediate Listicle Machine Learning Python

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Jingmiao

Many thank!. Such a nice one for us Python new beginners.

Show 1 reply

Thanks Jingmiao Regards, Sunil

Vishal

Hi sir, I want to start my carer as an analyst i completed my graduation frm Eco (Hns) DU. and working in amex frm past 6 month as MIS. Can you help me what should i learn to become a good analyst and frm where should i learn it. And please tell me diffrence between acturial science and Business analyst as m little confused between both of them and what should i do please guide

Hi Vishal, I would suggest you ask your career related queries in right thread. Below is the link for career related discussion. https://discuss.analyticsvidhya.com/c/career Regards, Sunil

Mamun Mahdeeb

Hello Sunil, Your research are so meaning full and it need to reach to all class of people, because nowadays almost people don't learn deeply. so you can take more step to spread your powerful research.

Thanks, Mamun!

Data Science Training in Hyderabad

Your post to explain everything in detail and it was very interesting to read and seeing. Thank you. it is useful to python, data science users.

Reading list

Intoduction to Python

Variables and data types

OOPs Concepts

Conditional statement

Looping Constructs

Data Structures

String Manipulation

Functions

Modules, Packages and Standard Libraries

Python Libraries for Data Science

Reading Data Files in Python

Preprocessing, Subsetting and Modifying Pandas Dataframes

Sorting and Aggregating Data in Pandas

Visualizing Patterns and Trends in Data

Programming

22 must watch talks on Python for Deep Learning, Machine Learning & Data Science (from PyData 2017, Amsterdam)

Introduction

Deep Learning talks

1) Title : Deep Learning at Booking.com

2) Title : Using deep learning in natural language processing

3) Title: Creativity and AI: Deep Neural Nets “Going Wild”

4) Title : Neural Networks for Recommender Systems

5) Title : Training a TensorFlow model to detect lung nodules on CT scans

6) Title : Siamese LSTM in Keras: Learning Character-Based Phrase

7) Title : Deep learning for time series made easy

8) Title : Deep Reinforcement Learning: theory, intuition, code

Big Data

9) Title: Different Strategies of Scaling H2O Machine Learning on Apache Spark

10) Title: A billion stars in the Jupyter Notebook

11) Title: Finding Needles in a Growing Haystack

Data Science

12) Title: Survival analysis for conversion rates

13) Title : Risk Analysis

14) Title: Python vs Orangutan

15) Title : Diagnosing Machine Learning Models

16) Title : Data Science in Internet of Things using Python and Spark

17) Title : Bayesian optimization with Scikit-Optimize

18) Title : Applied Data Science

19) Title : Successfully applying Bayesian statistics to A/B testing in your business

20) Title: Deploying Python Models to Production

Natural Language Processing

21) Title: Pythonic Metal

22) Title: Simulate your language

End Notes

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth