Automating the Machine Learning Pipeline with AutoML

Google released several papers on neural architecture search that created huge buzz in the machine learning practitioner community. Neural architecture search is a variant of automated machine leaning where a neural network creates another neural network model for a given problem. Interesting fact is that, this task is done by replacing any manual intervention from data scientist. The task is accomplished by a thorough search in the neural network components space to select and configure the components that yield a full network that results in optimal performance measures.

While neural architecture search seems new to the ML community, automated machine learning (AutoML) itself is not new. This area has been discussed back in 2016 prevalently however seldom used by practitioners. Release of blogs and open discussion on this topic by Google in recent meetings (2018) has driven a lot of attention on the tools and techniques available in this area.

AutoML enables rapid development of machine learning algorithms for problems on-hand. Typically, a data scientist working on a ml problem works on preprocessing the data, creating model features based on domain expertise, selecting appropriate model family, performs hyper parameter tuning, doing post processing and analyzing the model results. Often, these tasks are super tedious and span through several weeks and months. AutoML and the suite of tools available in this area attempts to automate all of these tasks of a data scientist therefore enabling almost a one – click ml pipeline development.

This conference talk attempts to introduce the concept of autoML to the participants.

Structure of the Hack Session

We start with picking two analytics vidhya hackathon problems and the associated datasets and discuss the datasets and the problem -onhand (We provide a clean dataset as the focus area of the talk is not data pre-processing)
We perform deep feature synthesis using featuretools python package and examine the features automatically generated
We create ML pipeline through TPOT frameworks and auto sklearn AI frameworks
We compare the performance measurement obtained with that of the leader-board scores
Further, the talk discusses leveraging the autoML created features in tandem with manually curated domain knowledge based features to come up with effective models
Comparison of various automl tools through their performance measurements on several problem
Q & A

Platforms / Tools to be used : Linux, Anaconda Python

Hackers

Dr. Sunil Chinnamgari

Dr. Sunil Chinnamgari is a AI researcher with about 15 years industry experience. Currently, he works in the capacity of a Sr. Lead Data Scientist with Fidelity Investments, Bangalore. He has published several research papers in Scopus, IEEE journals and is a frequent speaker in various reputed colleges in and around Bangalore. Sunil has a PhD in Computer Science (NLP and ML Specialization) from Bharathiar University, Coimbatore. He is an avid coder and has won multiple hackathons. In his spare time, Sunil likes to teach, travel and be on top of learning new advancements in AI.

Duration of Hack-Session: 1 hour

Buy Ticket