Pranav Dar — April 30, 2018
AVbytes Python

Overview

  • Hypertools is a python library that reduces high dimensional data and plots it
  • It uses PCA at it’s core and is built on top of libraries like seaborn, scikit-learn and matplotlib
  • The visualizations it produces are intuitive and amazing

 

Introduction

Data is getting more and more complex these days as the number of data sources increases. A data scientist’s job is to extract actionable insights from this data, but as more and dimensions are added to it, this is no easy task. Humans perceive the world in 3 dimensions so recognizing patterns from thousands, if not millions, of variables is a task we rely heavily on machines for.

But even machines can struggle with this. This is where the awesome technique of dimensionality reduction comes into the picture. In case you haven’t come across this term yet, you can check out AV’s article about it here. As the name suggests, it basically reduces the number of dimensions in a dataset to make it more easy to work with. There are a few different techniques to achieve this, and one of the most common ones is called PCA, or Principal Component Analysis.

Hypertools was designed with PCA and data visualization at the core. It’s a python library designed to implement dimensionality reduction-based visual explorations of datasets (or a series of datasets) with high dimensions.

How does it work? As input, you feed in the dataset with high dimensions. In a single function command, Hypertools reduces the dimensionality of the data and visualizes it in the form of a plot. The library has been developed on top of a few popular python libraries, like scikit-learn, seaborn and of course, matplotlib.

As mentioned by the developers, below are a few main features which HyperTools provides for data scientists:

  • Functions for plotting high-dimensional datasets in 2/3D
  • Static and animated plots
  • Simple API for customizing plot styles
  • Set of powerful data manipulation tools including hyperalignment, k-means clustering, normalizing and more
  • Support for lists of Numpy arrays, Pandas dataframes, text or (mixed) lists
  • Applying topic models and other text vectorization methods to text data

To install the latest stable version of Hypertools from pip, run the below command:

pip install hypertools

 

You can check out the GitHub repository for HyperTools here and also read their research paper here. Also be sure to check out the short video below which introduces this library:

 

Our take on this

I love this library! Anyone who has handled a dataset with a lot of variables knows what a headache it can be. While performing PCA is considered necessary, Hypertools makes it so much more easier for a data scientist to deal with thousands and millions of variables.

I’m a huge advocate of visualizing data so this is quickly becoming one of my favourite libraries. The way it allows you to look at your dimensions, in hyperspace and from all angles, it’s truly awesome. It’s no wonder the library has received almost a 1000 stars so quickly and has become popular in the data science community.

Try out this library and let us know how it worked out for you.

 

Subscribe to AVBytes here to get regular data science, machine learning and AI updates in your inbox!

 

About the Author

Pranav Dar

Senior Editor at Analytics Vidhya. Data visualization practitioner who loves reading and delving deeper into the data science and machine learning arts. Always looking for new ways to improve processes using ML and AI.

Our Top Authors

  • Analytics Vidhya
  • Guest Blog
  • Tavish Srivastava
  • Aishwarya Singh
  • Ram Dewani
  • Faizan Shaikh
  • Aniruddha Bhandari

Download Analytics Vidhya App for the Latest blog/Article

2 thoughts on "Visualize and Perform Dimensionality Reduction in Python using Hypertools"

Jason
Jason says: May 01, 2018 at 4:56 pm
is there a step by step example in python using a the iris dataset ? Reply
Pranav Dar
Pranav Dar says: May 01, 2018 at 6:27 pm
Hi Jason, Thanks for reading the article! The Iris dataset is too small (only 4 variables) for dimensionality reduction technique(s) to be effective. You would require a much bigger dataset for Hypertools to be truly useful! I would suggest going through the below article and trying it out on the dataset mentioned in it: https://www.analyticsvidhya.com/blog/2016/03/practical-guide-principal-component-analysis-python/ Reply

Leave a Reply Your email address will not be published. Required fields are marked *