Feature engineering has been at the core of any hackathon winning solution. It has become the defacto go-to option when you’re looking to differentiate your solution from the competition. But it’s often difficult to engineer new features from the dataset you’ve been given. It’s a time (and energy) consuming process.
This is where the tool set from Feature Labs comes into play. Developed by the folks at Feature Labs, ‘Featuretools‘ is an open-source framework for automating feature engineering.
The company has developed this by using a process called Deep Feature Synthesis (DFS). According to Feature Labs CEO, Max Kanter, DFS creates features from raw relational and transactional datasets, like visits to a website or abandoned cart items, and automatically understands and converts that into a predictive signal. The above image gives you a general idea of how the tool works.
It can be integrated into both python 2 and 3. It has been designed to work with common frameworks like Pandas for data preparation and skikit-learn for machine learning.
According to their official website, the tool was “tested against 1000 data scientists in three world wide competitions. On average, Feature Labs performed as well as as well as top human competitors and only required 1/10th of the time”.
Early customers of the company include Spanish bank BBVA and developers at MIT. In fact, they’ve published a case study on how BBVA used Featuretools to create a credit card fraud detection system. You can view it here.
Feature engineering is one of the mose important steps in any machine learning pipeline. Whether it’s differentiating your ML algorithm in a hackathon, or creating features to mine the most out of your data as an organization, it’s a critical technique.
This release will not only save a lot of time for the user (or company), it will enable them to shift their focus to other areas of the data science life cycle. The fact that it’s available for python and can be used with common frameworks is a huge plus.
Lorem ipsum dolor sit amet, consectetur adipiscing elit,
This is truly awesome. Will save a whole lot of time, but will be interesting to see its practical implementation. Has it been released already?
Hi Fawad, Yes it's available on Feature Lab's website (link is in the article above).
Can you guys write a demo post on this? I’ve gone through their examples on git but am looking for more information.