Learn everything about Analytics

DIVE – MIT’s Open Source Tool for Data Exploration and Visualization for Data Scientists

Overview

  • MIT has unveiled an open source tool, called DIVE, for performing data exploration and visualization
  • Features include all sorts of graphs and curves for visualization, and even regression capabilities
  • We tested the tool and were pretty impressed; check out the results and details below

 

Introduction

Data cleaning is the most time consuming process in the data science lifecycle. But data exploration might be the most important one when it comes to building a good model. I have personally seen the accuracy of models drop significantly when the dataset at hand was not explored properly. It’s critical that we know what the data represents, if there are any biases, what features can we engineer, etc. All of this falls under data exploration. And now you don’t even have to write code to do this!

MIT’s research team has built a web-based data exploration system called DIVE, that lets you create stories from your data without having to write any code. You can have a look at the public version of DIVE here. Below is a brief summary of what you can expect from DIVE:

  • Intelligent Data Ingestion: DIVE can sample the data to infer the types of features and the structure of datasets
  • Semi-automated Visualization Recommendation: DIVE lets you select fields and recommends relevant visualizations. These visualizations can be sorted based on effectiveness, expressiveness, and statistical properties like correlation, entropy, and gini
  • Point-and-click Statistical Analysis: Using DIVE one can compare group means, explore relationships between fields and perform statistical analysis with just one click
  • WYSIWYG Visual Narratives: It provides you with a ‘what-you-see-is-what-you-get editor’ where you can share stories with interactive content linked to dynamic data

When it comes to analysis, the tool currently offers the below 4 options:

Below is a demo video by the team presenting the working of DIVE from uploading the dataset to exploring the tool. Have a look.

Here are the links to Front-end repository and Back-end repository provided by the team. For more information about DIVE, you can read their paper published in the proceedings of HILDA 2018.

 

Our take on this

Of course this is not the first automated tool in this space. The competition for automated ML is fierce but what makes DIVE stand out is it’s relatively lightweight appearance for quick exploration.

I took DIVE for a test run and it has impressed me a lot. It’s easy to use, is extremely efficient and the fact that I don’t have to install anything (it’s web based) is a major positive. I found the overall process extremely intuitive. Check out the below screenshots where I uploaded the dataset and analysed the data. This one is a simple statistical analysis of the variables in the dataset.

The below one is a summary of the linear regression model:

If you’re from a non-technical background, I would suggest trying out this tool. You don’t have to write a single line of code! Let me know your experience using it in the comments below.

 

Subscribe to AVBytes here to get regular data science, machine learning and AI updates in your inbox!

 

You can also read this article on Analytics Vidhya's Android APP Get it on Google Play