Must Watch Data Science Videos from SciPy Conference 2015

Kunal Jain 16 Sep, 2015 • 10 min read

Introduction

This was in my first year of engineering degree. A hungry, home-food sick student (me) was treated (by a college senior) with a lavish buffet in one of the best five star hotels in Mumbai! You get served with so many dishes that you struggle to decide where to start, what to taste and what to eat!

Why is this relevant here? Well, I had a similar feeling when I looked at the videos from the recent SciPy conference.

All the Pythonists in the data science community are in for a treat today!

Scientific Computing with Python 2015, popularly known as  SciPy 2015 conference, which was held recently at Austin, Texas, USA had a similar experience to offer. This 6 day conference showcased the best of research work  happening in Python. There were some fantastic tutorials on data science. The pain point is, if you check the YouTube channel, 115 videos have been uploaded which covers topic ranging from bio science, geo science, astro science and data science. We had to watch every video to come up with this recommended list.

 

A few things to note:

  1. The videos listed below are strictly associated with Python in Data Science. If you are looking for anything else from SciPy 2015, you should check the official page of the conference. For your convenience, we have categorized the videos into different topics of data science which shall help you to learn in a structured manner.
  2. This list is by no means a judgement on usefulness of the work shown in the conference. We are no one to judge that. We have come up with this recommendation with an objective to help our audience cover the best of data science from the conference in lesser time. If you have time and motivation, you should definitely view the awesome work on showcase.
  3. If you do the math, it is almost humanly impossible to watch all these videos with my current commitments in less than a week. I would not have been able to compile this without awesome help from Manish & Aditya (our summer intern). Thanks to them for helping me out with creating this.

 

Let’s warm up with the keynotes:

1.  Keynote: State of Tools in Python

Duration- 45:06 mins

MI -1

Summary: In this video, Jake VanderPlas talks about the high level view of the SciPy ecosystem, covering topics including visualization tools like Matplotlib, Seaborn, Bokeh (which plots in HTML5), Data Structures and Arrays tools like xray, and some recent developments including Numba package, Anaconda, IPython and Jupyter and gives a history of how these tools developed over time.

 

2. Keynote: My Data Journey with Python

Duration- 47:45 mins

MI -2

Summary:  Wes McKinney, father of Pandas, returns to SciPy and talks about his own journey and experience from 2007 to the present. He mentions how being a mathematician exposed him to Python. In brief, he talks about what he’s personally focused on right now and what he sees as some of the opportunities for the community to continue to grow and flourish.

 

3. Keynote: Data Science at the New York Times

Duration- 38:37 mins

MI -3

Summary:  In this video, Chris Wiggins talks about his experience in Data Science being a Biologist and how he ended up with the New York Times, how New York Times used Data Science for its growth. He talks about what he does at the New York Times Data Science group and how Machine Learning plays a significant role in their company.

 

Machine Learning in Python

1. Tutorial on Machine Learning with Scikit Learn – Part 1

Duration- 3:22:05 hrs

M-1

Summary: If you have longed to learn Machine Learning in Python, here’s the ever wanted cake for you. This tutorial covers various aspects of machine learning using scikit-learn. This tutorial has two parts and both are ~3 hours long. Thereby, promising convincing dissemination of knowledge. This tutorial has everything you need to know about fundamentals of machine learning in Python.

 

2. Tutorial on Machine Learning with Scikit Learn – Part 2

Duration- 3:16:12 hrs

M -4

Summary: This video starts where the previous one ends. In this part, complexed problems pertaining to model selection, cross validation are taken and solved using a step by step methodology. The best part is, a great emphasis has been laid on model building thereby covering a wide range of parameter selection, validation and testing followed by applying the algorithms as learnt in the previous video.

 

3. Deep Learning Tips from the Road

Duration- 19:02 mins

M -2

Summary: This video is sort of a crash course on Deep Learning. The speaker covers the major concepts and topics associated with deep learning. But, skips convolutional networks. However, this video aptly explains the concepts such as conditioning, back propagation, recurrence etc. If you are a beginner, keen to know about deep learning, you can’t miss watching this.

 

4. Deploying Python Machine Learning Models in Production

Duration- 19:06 mins

M -3

Summary: The speaker aptly explains ‘Production’ of machine learning models is sum up of deployment, evaluation, management and monitoring. He beautifully describes the use of Machine Learning in Production with a focus on optimizing model performance.

 

5. Examining Malware (Kaggle Problem) with Python

Duration- 21:45 mins

M -5

Summary: Python tools for text classification can easily be used for malware classification. Phil Roth, data scientist shares his solution of Microsoft Malware Classification Challenge (2015) hosted by Kaggle. He discussed his approach, code and algorithms used to solve this dataset. He made use of Scikit learn for model building and managed to secure 29th rank in this competition.

 

6. Efficient Python for High Performance Parallel Computing

Duration- 3:23:50 hrs

M -6

Summary: This video covers the essentials of code optimization by taking the example of Monte Carlo games. In 3 hours, the speaker touches upon every bit of code which is essential for optimization and debugging it simultaneously. This video is apt for intermediate and high level python practitioners. Beginners might skip it.

 

Data Visualization in Python

1. A Better Default ColorMap for Matplotlib

Duration- 19:09 min

V -1

Summary: Presenting the data is as important as producing it. In this video, you will be introduced with colormaps and how you can bring it to use via matplotlib. This video gives a comprehensive overview of this feature. This video leaves you with enough information to be able to try this at your end.

 

2. VisPy Harnessing The GPU For Fast, High Level Visualization

Duration- 18:25 min

V -2

Summary: In this video, Luke Campagnola will introduce you with VisPy, a python library used for high level visualization. VisPy uses openGL to offload as much computation as possible. It uses more data, gives faster updates and takes lower CPU load. If visualization excites you, watch this and thanks me later!

 

3. Building Python Data Apps with Blaze and Bokeh

Duration- 3:20:19 hrs

V-3

Summary: I came across this tutorial by Christine on social media even before the videos were out on the official channel and I knew I had to watch this. This is a SciPy tutorial on building data applications using Blaze and Bokeh. In these 3 hours, you will learn how to use Bokeh to build interactive data visualization for the browser. Also, you will learn Blaze which is used to perform interactive analytics queries for a variety of backends using the dataset from Berkeley Earth and Lahmann.

 

4. Anatomy of Matplotlib

Duration- 3:18:29 hrs

V -4

Summary: Benjamin Root, one of the developers of matplotlib, briefly explains the concepts used behind matplotlib. He begins with the introduction, covers plotting functions and then moves to the more complex parts of this library, thereby, doing justice with the title ‘anatomy of matplotlib’. This is a good starter for beginners keen to use python for data visualization.

 

5. TrendVis, An Elegant Interface for Dense Sparkline Like Quantitative Visualizations

Duration- 18:41 mins

V -5

Summary: After learning about anatomy of matplotlib, this should be your next watch. Melissa Cross, flawlessly explains the use of TrendVis (interface for quantitative visualization using matplotlib) by demonstrating it in a stepwise fashion. In 18 minutes, she aptly describes the use of this interface and how to implement it in Python.

 

Data Mining in Python

1. Image Analysis in Python with SciPy and Scikit Image

Duration- 3:03:03 hrs

D -1

 

Summary: In this video, Stefan Van der Walt and his team give an introduction to how Scikit Image represents images, how it uses NumPy arrays to do so and what dimensions are involved in the data types etc. They talk about segmentation through interactive examples, image warping, feature detection and how to merge images together.

 

2. Analyzing and Manipulating Data with Pandas

Duration- 3:43:15 hrs

D - 2

Summary: In this video, Jonathan Rocher gives a tutorial on Data Analysis and Munging with Pandas. He shows how Pandas can make our life easier and work efficient in handling tabular data and how it can be used to perform some easy and some not so easy tasks that we do in Excel or in SQL.

 

3. Introduction to NumPy

Duration- 2:42:52 hrs

D -3

Summary:  Eric Jones, in this video, provides a hands-on tutorial and introduction to NumPy in Python with some details on how it works. He touches on topics including NumPy Arrays, Slicing, etc. and shows how to do data visualization and data manipulation using NumPy library.

 

4. Modern Optimization Methods in Python

Duration- 3:04:43 hrs

D -4

Summary:  Mike McKerns talks about Optimization methods in Python and about the modernization of these Optimization methods. He goes into the details of these Optimization techniques and shows how to implement them using Python.

 

5. Geospatial Data with Open Source Tools in Python

Duration- 3:03:30 hrs

D -5

Summary:  Kelsey Jordahl gives a tutorial on working with Geospatial data using Open Source tools in Python in this video. This tutorial also provides some hands-on examples and exercises for practice. He emphasizes on the Pythonic ways of working with the Geospatial data by interfacing with NumPy and SciPy stack.

 

6. Eigenvector Spatial Filtering using NumPy and Arc GIS

Duration- 18:53 mins

D -6

Summary: In this video, Bryan Chastain talks about Spatial Statistics and Eigenvector Spatial Filtering using NumPy. He talks about techniques for controlling Spatial Autocorrelation like Geographically Weighted Regression, Spatial Weights Matrix, etc. and also compares the processing time with R.

 

7. DistArray Distributed Array Computing for Python

Duration- 17:28 mins

D -7

Summary: Robert Grant talks about Distributed Array Computation in Python, how he and his team built DistArray using the existing libraries in Python like NumPy, IPython Parallel, mpi4py, etc. He shows how DistArray works and how we can perform some manipulation on data using it.

 

Exploring Statistics in Python

1. Computational Statistics in Python – Part I

Duration- 1:43:07 hrs

S -1

Summary:  In this video, Allen Downey gives a talk on Computational Statistics and Statistical Inference. He explains the concept in three parts, estimating  and describing the Effect Size, Quantifying Precision and Hypothesis testing using Python. He uses the libraries SciPy, NumPy and Pandas in his tutorial.

 

2. Computational Statistics in Python – Part II

Duration- 3:01:25 hrs

S -3

 

Summary: This video is a follow up to the previous video “Computational Statistics I” and in it, Chris Fonnesbeck extends the concept a little with a different objective – ‘some useful computational tools that would allow us to build statistical models competently’. This is kind of a useful survey of the statistical tools motivated by lack of utility of stuff we learnt in an Undergraduate/graduate Statistics Program.

 

3. Statistical Thinking for Data Science

Duration- 23:50 mins

S -2

 

Summary: Chris Fonnesbeck, in this video, talks about issues related to Data Science that as a Statistician, he thinks are important for its success as an emerging new field. He goes into showing us some statistics from the history and also walks us through other statistical concepts and data around us.

 

Miscellaneous

1. Jupyter Advanced Topics Tutorial

Duration- 2:48:53 hrs

MI -4

Summary:  This video talks about some advanced tools and topics related to Python. Jonathoan Frederic talks about advanced topics in Jupyter and gives some hands-on tutorials for the same. He uses tools like jQuery, CSS, SciPy in the tutorials. The video also contains the talk by Matthias Bussonier who further elaborates on the topic.

 

2. Accelerating Python with Numba JIT Complier

Duration- 20:22 mins

MI -5

Summary:  This video showcases Stanley Seibert talking about the Open Source NumPy-aware optimizing compiler for Python called Numba. He compares Numba with other compilers and goes through various features of Numba and how Numba works. He also talks about some basics of Numba and how to use it.

End Notes

In this article, we have listed the list of data science videos from SciPy Conference 2015. We found these videos enriching in their respective subjects and realized that they can be of help for you as well. In case we have missed out on any useful video from SciPy videos playlist, feel free to enlist them in the comments section below.

If you like what you just read & want to continue your analytics learning, subscribe to our emailsfollow us on twitter or like our facebook page.

Kunal Jain 16 Sep 2015

Kunal is a post graduate from IIT Bombay in Aerospace Engineering. He has spent more than 10 years in field of Data Science. His work experience ranges from mature markets like UK to a developing market like India. During this period he has lead teams of various sizes and has worked on various tools like SAS, SPSS, Qlikview, R, Python and Matlab.

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear

siddharth
siddharth 17 Jul, 2015

Hey kunal, many thanks for this list. I plan on gorging the videos on statistical thinking in python... and then later if time permits , machine learning in python using scikit learn. Keep up the good work... Sincerely,

Anurag GV
Anurag GV 17 Jul, 2015

Awesome!

Anon
Anon 18 Jul, 2015

Now, if you could just tell us how to have 48-hour days, I'll start off immediately with the first video. ;) Thanks for the links, as usual. I need to set up some sort of regimen for watching these.