- MIT has unveiled an open source tool, called DIVE, for performing data exploration and visualization
- Features include all sorts of graphs and curves for visualization, and even regression capabilities
- We tested the tool and were pretty impressed; check out the results and details below
Data cleaning is the most time consuming process in the data science lifecycle. But data exploration might be the most important one when it comes to building a good model. I have personally seen the accuracy of models drop significantly when the dataset at hand was not explored properly. It’s critical that we know what the data represents, if there are any biases, what features can we engineer, etc. All of this falls under data exploration. And now you don’t even have to write code to do this!
MIT’s research team has built a web-based data exploration system called DIVE, that lets you create stories from your data without having to write any code. You can have a look at the public version of DIVE here. Below is a brief summary of what you can expect from DIVE:
- Intelligent Data Ingestion: DIVE can sample the data to infer the types of features and the structure of datasets
- Semi-automated Visualization Recommendation: DIVE lets you select fields and recommends relevant visualizations. These visualizations can be sorted based on effectiveness, expressiveness, and statistical properties like correlation, entropy, and gini
- Point-and-click Statistical Analysis: Using DIVE one can compare group means, explore relationships between fields and perform statistical analysis with just one click
- WYSIWYG Visual Narratives: It provides you with a ‘what-you-see-is-what-you-get editor’ where you can share stories with interactive content linked to dynamic data
When it comes to analysis, the tool currently offers the below 4 options:
Below is a demo video by the team presenting the working of DIVE from uploading the dataset to exploring the tool. Have a look.
Our take on this
Of course this is not the first automated tool in this space. The competition for automated ML is fierce but what makes DIVE stand out is it’s relatively lightweight appearance for quick exploration.
I took DIVE for a test run and it has impressed me a lot. It’s easy to use, is extremely efficient and the fact that I don’t have to install anything (it’s web based) is a major positive. I found the overall process extremely intuitive. Check out the below screenshots where I uploaded the dataset and analysed the data. This one is a simple statistical analysis of the variables in the dataset.
The below one is a summary of the linear regression model:
If you’re from a non-technical background, I would suggest trying out this tool. You don’t have to write a single line of code! Let me know your experience using it in the comments below.
Subscribe to AVBytes here to get regular data science, machine learning and AI updates in your inbox!