Visualize and Perform Dimensionality Reduction in Python using Hypertools
- Hypertools is a python library that reduces high dimensional data and plots it
- It uses PCA at it’s core and is built on top of libraries like seaborn, scikit-learn and matplotlib
- The visualizations it produces are intuitive and amazing
Data is getting more and more complex these days as the number of data sources increases. A data scientist’s job is to extract actionable insights from this data, but as more and dimensions are added to it, this is no easy task. Humans perceive the world in 3 dimensions so recognizing patterns from thousands, if not millions, of variables is a task we rely heavily on machines for.
But even machines can struggle with this. This is where the awesome technique of dimensionality reduction comes into the picture. In case you haven’t come across this term yet, you can check out AV’s article about it here. As the name suggests, it basically reduces the number of dimensions in a dataset to make it more easy to work with. There are a few different techniques to achieve this, and one of the most common ones is called PCA, or Principal Component Analysis.
Hypertools was designed with PCA and data visualization at the core. It’s a python library designed to implement dimensionality reduction-based visual explorations of datasets (or a series of datasets) with high dimensions.
How does it work? As input, you feed in the dataset with high dimensions. In a single function command, Hypertools reduces the dimensionality of the data and visualizes it in the form of a plot. The library has been developed on top of a few popular python libraries, like scikit-learn, seaborn and of course, matplotlib.
As mentioned by the developers, below are a few main features which HyperTools provides for data scientists:
- Functions for plotting high-dimensional datasets in 2/3D
- Static and animated plots
- Simple API for customizing plot styles
- Set of powerful data manipulation tools including hyperalignment, k-means clustering, normalizing and more
- Support for lists of Numpy arrays, Pandas dataframes, text or (mixed) lists
- Applying topic models and other text vectorization methods to text data
To install the latest stable version of Hypertools from pip, run the below command:
pip install hypertools
Our take on this
I love this library! Anyone who has handled a dataset with a lot of variables knows what a headache it can be. While performing PCA is considered necessary, Hypertools makes it so much more easier for a data scientist to deal with thousands and millions of variables.
I’m a huge advocate of visualizing data so this is quickly becoming one of my favourite libraries. The way it allows you to look at your dimensions, in hyperspace and from all angles, it’s truly awesome. It’s no wonder the library has received almost a 1000 stars so quickly and has become popular in the data science community.
Try out this library and let us know how it worked out for you.
Subscribe to AVBytes here to get regular data science, machine learning and AI updates in your inbox!