Perform Automated Feature Engineering in Python with Featuretools
- Feature Labs has launched a set to tools to make machine algorithms train quicker
- Automated feature engineering is at the heart of it
- Tested in three competitions and took only 1/10th of the time required by a human competitor
- Designed to work with common frameworks like Pandas for data preparation and skikit-learn for ML
- Works with both python 2 and 3
Feature engineering has been at the core of any hackathon winning solution. It has become the defacto go-to option when you’re looking to differentiate your solution from the competition. But it’s often difficult to engineer new features from the dataset you’ve been given. It’s a time (and energy) consuming process.
This is where the tool set from Feature Labs comes into play. Developed by the folks at Feature Labs, ‘Featuretools‘ is an open-source framework for automating feature engineering.
The company has developed this by using a process called Deep Feature Synthesis (DFS). According to Feature Labs CEO, Max Kanter, DFS creates features from raw relational and transactional datasets, like visits to a website or abandoned cart items, and automatically understands and converts that into a predictive signal. The above image gives you a general idea of how the tool works.
It can be integrated into both python 2 and 3. It has been designed to work with common frameworks like Pandas for data preparation and skikit-learn for machine learning.
According to their official website, the tool was “tested against 1000 data scientists in three world wide competitions. On average, Feature Labs performed as well as as well as top human competitors and only required 1/10th of the time”.
Early customers of the company include Spanish bank BBVA and developers at MIT. In fact, they’ve published a case study on how BBVA used Featuretools to create a credit card fraud detection system. You can view it here.
Our take on this
Feature engineering is one of the mose important steps in any machine learning pipeline. Whether it’s differentiating your ML algorithm in a hackathon, or creating features to mine the most out of your data as an organization, it’s a critical technique.
This release will not only save a lot of time for the user (or company), it will enable them to shift their focus to other areas of the data science life cycle. The fact that it’s available for python and can be used with common frameworks is a huge plus.
Subscribe to AVBytes here to get regular data science, machine learning and AI updates in your inbox!