This was in my first year of engineering degree. A hungry, home-food sick student (me) was treated (by a college senior) with a lavish buffet in one of the best five star hotels in Mumbai! You get served with so many dishes that you struggle to decide where to start, what to taste and what to eat!
Why is this relevant here? Well, I had a similar feeling when I looked at the videos from the recent SciPy conference.
All the Pythonists in the data science community are in for a treat today!
Scientific Computing with Python 2015, popularly known as SciPy 2015 conference, which was held recently at Austin, Texas, USA had a similar experience to offer. This 6 day conference showcased the best of research work happening in Python. There were some fantastic tutorials on data science. The pain point is, if you check the YouTube channel, 115 videos have been uploaded which covers topic ranging from bio science, geo science, astro science and data science. We had to watch every video to come up with this recommended list.
A few things to note:
- The videos listed below are strictly associated with Python in Data Science. If you are looking for anything else from SciPy 2015, you should check the official page of the conference. For your convenience, we have categorized the videos into different topics of data science which shall help you to learn in a structured manner.
- This list is by no means a judgement on usefulness of the work shown in the conference. We are no one to judge that. We have come up with this recommendation with an objective to help our audience cover the best of data science from the conference in lesser time. If you have time and motivation, you should definitely view the awesome work on showcase.
- If you do the math, it is almost humanly impossible to watch all these videos with my current commitments in less than a week. I would not have been able to compile this without awesome help from Manish & Aditya (our summer intern). Thanks to them for helping me out with creating this.
Let’s warm up with the keynotes:
Duration- 45:06 mins
Summary: In this video, Jake VanderPlas talks about the high level view of the SciPy ecosystem, covering topics including visualization tools like Matplotlib, Seaborn, Bokeh (which plots in HTML5), Data Structures and Arrays tools like xray, and some recent developments including Numba package, Anaconda, IPython and Jupyter and gives a history of how these tools developed over time.
Duration- 47:45 mins
Summary: Wes McKinney, father of Pandas, returns to SciPy and talks about his own journey and experience from 2007 to the present. He mentions how being a mathematician exposed him to Python. In brief, he talks about what he’s personally focused on right now and what he sees as some of the opportunities for the community to continue to grow and flourish.
Duration- 38:37 mins
Summary: In this video, Chris Wiggins talks about his experience in Data Science being a Biologist and how he ended up with the New York Times, how New York Times used Data Science for its growth. He talks about what he does at the New York Times Data Science group and how Machine Learning plays a significant role in their company.
Machine Learning in Python
Duration- 3:22:05 hrs
Summary: If you have longed to learn Machine Learning in Python, here’s the ever wanted cake for you. This tutorial covers various aspects of machine learning using scikit-learn. This tutorial has two parts and both are ~3 hours long. Thereby, promising convincing dissemination of knowledge. This tutorial has everything you need to know about fundamentals of machine learning in Python.
Duration- 3:16:12 hrs
Summary: This video starts where the previous one ends. In this part, complexed problems pertaining to model selection, cross validation are taken and solved using a step by step methodology. The best part is, a great emphasis has been laid on model building thereby covering a wide range of parameter selection, validation and testing followed by applying the algorithms as learnt in the previous video.
Duration- 19:02 mins
Summary: This video is sort of a crash course on Deep Learning. The speaker covers the major concepts and topics associated with deep learning. But, skips convolutional networks. However, this video aptly explains the concepts such as conditioning, back propagation, recurrence etc. If you are a beginner, keen to know about deep learning, you can’t miss watching this.
Duration- 19:06 mins
Summary: The speaker aptly explains ‘Production’ of machine learning models is sum up of deployment, evaluation, management and monitoring. He beautifully describes the use of Machine Learning in Production with a focus on optimizing model performance.
Duration- 21:45 mins
Summary: Python tools for text classification can easily be used for malware classification. Phil Roth, data scientist shares his solution of Microsoft Malware Classification Challenge (2015) hosted by Kaggle. He discussed his approach, code and algorithms used to solve this dataset. He made use of Scikit learn for model building and managed to secure 29th rank in this competition.
Duration- 3:23:50 hrs
Summary: This video covers the essentials of code optimization by taking the example of Monte Carlo games. In 3 hours, the speaker touches upon every bit of code which is essential for optimization and debugging it simultaneously. This video is apt for intermediate and high level python practitioners. Beginners might skip it.
Data Visualization in Python
Duration- 19:09 min
Summary: Presenting the data is as important as producing it. In this video, you will be introduced with colormaps and how you can bring it to use via matplotlib. This video gives a comprehensive overview of this feature. This video leaves you with enough information to be able to try this at your end.
Duration- 18:25 min
Summary: In this video, Luke Campagnola will introduce you with VisPy, a python library used for high level visualization. VisPy uses openGL to offload as much computation as possible. It uses more data, gives faster updates and takes lower CPU load. If visualization excites you, watch this and thanks me later!
Duration- 3:20:19 hrs
Summary: I came across this tutorial by Christine on social media even before the videos were out on the official channel and I knew I had to watch this. This is a SciPy tutorial on building data applications using Blaze and Bokeh. In these 3 hours, you will learn how to use Bokeh to build interactive data visualization for the browser. Also, you will learn Blaze which is used to perform interactive analytics queries for a variety of backends using the dataset from Berkeley Earth and Lahmann.
Duration- 3:18:29 hrs
Summary: Benjamin Root, one of the developers of matplotlib, briefly explains the concepts used behind matplotlib. He begins with the introduction, covers plotting functions and then moves to the more complex parts of this library, thereby, doing justice with the title ‘anatomy of matplotlib’. This is a good starter for beginners keen to use python for data visualization.
Duration- 18:41 mins
Summary: After learning about anatomy of matplotlib, this should be your next watch. Melissa Cross, flawlessly explains the use of TrendVis (interface for quantitative visualization using matplotlib) by demonstrating it in a stepwise fashion. In 18 minutes, she aptly describes the use of this interface and how to implement it in Python.
Data Mining in Python
Duration- 3:03:03 hrs
Summary: In this video, Stefan Van der Walt and his team give an introduction to how Scikit Image represents images, how it uses NumPy arrays to do so and what dimensions are involved in the data types etc. They talk about segmentation through interactive examples, image warping, feature detection and how to merge images together.
Duration- 3:43:15 hrs
Summary: In this video, Jonathan Rocher gives a tutorial on Data Analysis and Munging with Pandas. He shows how Pandas can make our life easier and work efficient in handling tabular data and how it can be used to perform some easy and some not so easy tasks that we do in Excel or in SQL.
Duration- 2:42:52 hrs
Summary: Eric Jones, in this video, provides a hands-on tutorial and introduction to NumPy in Python with some details on how it works. He touches on topics including NumPy Arrays, Slicing, etc. and shows how to do data visualization and data manipulation using NumPy library.
Duration- 3:04:43 hrs
Summary: Mike McKerns talks about Optimization methods in Python and about the modernization of these Optimization methods. He goes into the details of these Optimization techniques and shows how to implement them using Python.
Duration- 3:03:30 hrs
Summary: Kelsey Jordahl gives a tutorial on working with Geospatial data using Open Source tools in Python in this video. This tutorial also provides some hands-on examples and exercises for practice. He emphasizes on the Pythonic ways of working with the Geospatial data by interfacing with NumPy and SciPy stack.
Duration- 18:53 mins
Summary: In this video, Bryan Chastain talks about Spatial Statistics and Eigenvector Spatial Filtering using NumPy. He talks about techniques for controlling Spatial Autocorrelation like Geographically Weighted Regression, Spatial Weights Matrix, etc. and also compares the processing time with R.
Duration- 17:28 mins
Summary: Robert Grant talks about Distributed Array Computation in Python, how he and his team built DistArray using the existing libraries in Python like NumPy, IPython Parallel, mpi4py, etc. He shows how DistArray works and how we can perform some manipulation on data using it.
Exploring Statistics in Python
Duration- 1:43:07 hrs
Summary: In this video, Allen Downey gives a talk on Computational Statistics and Statistical Inference. He explains the concept in three parts, estimating and describing the Effect Size, Quantifying Precision and Hypothesis testing using Python. He uses the libraries SciPy, NumPy and Pandas in his tutorial.
Duration- 3:01:25 hrs
Summary: This video is a follow up to the previous video “Computational Statistics I” and in it, Chris Fonnesbeck extends the concept a little with a different objective – ‘some useful computational tools that would allow us to build statistical models competently’. This is kind of a useful survey of the statistical tools motivated by lack of utility of stuff we learnt in an Undergraduate/graduate Statistics Program.
Duration- 23:50 mins
Summary: Chris Fonnesbeck, in this video, talks about issues related to Data Science that as a Statistician, he thinks are important for its success as an emerging new field. He goes into showing us some statistics from the history and also walks us through other statistical concepts and data around us.
Duration- 2:48:53 hrs
Summary: This video talks about some advanced tools and topics related to Python. Jonathoan Frederic talks about advanced topics in Jupyter and gives some hands-on tutorials for the same. He uses tools like jQuery, CSS, SciPy in the tutorials. The video also contains the talk by Matthias Bussonier who further elaborates on the topic.
Duration- 20:22 mins
Summary: This video showcases Stanley Seibert talking about the Open Source NumPy-aware optimizing compiler for Python called Numba. He compares Numba with other compilers and goes through various features of Numba and how Numba works. He also talks about some basics of Numba and how to use it.
In this article, we have listed the list of data science videos from SciPy Conference 2015. We found these videos enriching in their respective subjects and realized that they can be of help for you as well. In case we have missed out on any useful video from SciPy videos playlist, feel free to enlist them in the comments section below.