Perform Automated Feature Engineering in Python with Featuretools

Pranav Dar 12 Jul, 2018 • 2 min read


  • Feature Labs has launched a set to tools to make machine algorithms train quicker
  • Automated feature engineering is at the heart of it
  • Tested in three competitions and took only 1/10th of the time required by a human competitor
  • Designed to work with common frameworks like Pandas for data preparation and skikit-learn for ML
  • Works with both python 2 and 3



Feature engineering has been at the core of any hackathon winning solution. It has become the defacto go-to option when you’re looking to differentiate your solution from the competition. But it’s often difficult to engineer new features from the dataset you’ve been given. It’s a time (and energy) consuming process.

This is where the tool set from Feature Labs comes into play. Developed by the folks at Feature Labs, ‘Featuretools‘ is an open-source framework for automating feature engineering.

The company has developed this by using a process called Deep Feature Synthesis (DFS). According to Feature Labs CEO, Max Kanter, DFS creates features from raw relational and transactional datasets, like visits to a website or abandoned cart items, and automatically understands and converts that into a predictive signal. The above image gives you a general idea of how the tool works.

It can be integrated into both python 2 and 3. It has been designed to work with common frameworks like Pandas for data preparation and skikit-learn for machine learning.

According to their official website, the tool was “tested against 1000 data scientists in three world wide competitions. On average, Feature Labs performed as well as as well as top human competitors and only required 1/10th of the time”.

Early customers of the company include Spanish bank BBVA and developers at MIT. In fact, they’ve published a case study on how BBVA used Featuretools to create a credit card fraud detection system. You can view it here.


Our take on this

Feature engineering is one of the mose important steps in any machine learning pipeline. Whether it’s differentiating your ML algorithm in a hackathon, or creating features to mine the most out of your data as an organization, it’s a critical technique.

This release will not only save a lot of time for the user (or company), it will enable them to shift their focus to other areas of the data science life cycle. The fact that it’s available for python and can be used with common frameworks is a huge plus.


Subscribe to AVBytes here to get regular data science, machine learning and AI updates in your inbox!


Pranav Dar 12 Jul 2018

Senior Editor at Analytics Vidhya. Data visualization practitioner who loves reading and delving deeper into the data science and machine learning arts. Always looking for new ways to improve processes using ML and AI.

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers


Fawad Mahdi
Fawad Mahdi 26 Feb, 2018

This is truly awesome. Will save a whole lot of time, but will be interesting to see its practical implementation. Has it been released already?

Nick Bernini
Nick Bernini 18 Mar, 2018

Can you guys write a demo post on this? I’ve gone through their examples on git but am looking for more information.

  • [tta_listen_btn class="listen"]