Visualize and Perform Dimensionality Reduction in Python using Hypertools

Pranav Dar 13 May, 2019 ā€¢ 3 min read

Overview

  • Hypertools is a python library that reduces high dimensional data and plots it
  • It uses PCA at it’s core and is built on top of libraries like seaborn, scikit-learn and matplotlib
  • The visualizations it produces are intuitive and amazing

 

Introduction

Data is getting more and more complex these days as the number of data sources increases. A data scientist’s job is to extract actionable insights from this data, but as more and dimensions are added to it, this is no easy task. Humans perceive the world in 3 dimensions so recognizing patterns from thousands, if not millions, of variables is a task we rely heavily on machines for.

But even machines can struggle with this. This is where the awesome technique of dimensionality reduction comes into the picture. In case you haven’t come across this term yet, you can check out AV’s article about it here. As the name suggests, it basically reduces the number of dimensions in a dataset to make it more easy to work with. There are a few different techniques to achieve this, and one of the most common ones is called PCA, or Principal Component Analysis.

Hypertools was designed with PCA and data visualization at the core. It’s a python library designed to implement dimensionality reduction-based visual explorations of datasets (or a series of datasets) with high dimensions.

How does it work? As input, you feed in the dataset with high dimensions. In a single function command, Hypertools reduces the dimensionality of the data and visualizes it in the form of a plot. The library has been developed on top of a few popular python libraries, like scikit-learn, seaborn and of course, matplotlib.

As mentioned by the developers, below are a few mainĀ features which HyperTools provides for data scientists:

  • Functions for plotting high-dimensional datasets in 2/3D
  • Static and animated plots
  • Simple API for customizing plot styles
  • Set of powerful data manipulation tools including hyperalignment, k-means clustering, normalizing and more
  • Support for lists of Numpy arrays, Pandas dataframes, text or (mixed) lists
  • Applying topic models and other text vectorization methods to text data

To install the latest stable version of Hypertools from pip, run the below command:

pip install hypertools

 

You can check out the GitHub repository for HyperTools hereĀ and also read their research paper here. Also be sure to check out the short video below which introduces this library:

 

Our take on this

I love this library! Anyone who has handled a dataset with a lot of variables knows what a headache it can be. While performing PCA is considered necessary, Hypertools makes it so much more easier for a data scientist to deal with thousands and millions of variables.

I’m a huge advocate of visualizing data so this is quickly becoming one of my favourite libraries. The way it allows you to look at your dimensions, in hyperspace and from all angles, it’s truly awesome. It’s no wonder the library has received almost a 1000 stars so quickly and has become popular in the data science community.

Try out this library and let us know how it worked out for you.

 

Subscribe to AVBytes here to get regular data science, machine learning and AI updates in your inbox!

 

Pranav Dar 13 May 2019

Senior Editor at Analytics Vidhya. Data visualization practitioner who loves reading and delving deeper into the data science and machine learning arts. Always looking for new ways to improve processes using ML and AI.

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear

Jason
Jason 01 May, 2018

is there a step by step example in python using a the iris dataset ?

Related Courses