Learn everything about Analytics

Home » Style your Pandas DataFrame and Make it Stunning

Style your Pandas DataFrame and Make it Stunning

This article was published as a part of the Data Science Blogathon

Introduction

Pandas is an important data science library and everybody involved in data science uses it extensively. It presents the data in the form of a table similar to what we see in excel. If you have worked with excel, you must be aware that you can customize your sheets, add colors to the cells, and mark important figures that need extra attention.

While working with pandas, have you ever thought about how you can do the same styling to dataframes to make them more appealing and explainable? Generating reports out of the dataframes is a good option but what if you can do the styling in the dataframe using Pandas only?

That’s where the Pandas Style API comes to the rescue. This detailed article will go through all the features of Pandas styling, various types of built-in functions, creating our custom functions, and some of its advanced usages.

Pandas DataFrame Styling
Image by Author (Made in Canva)

Introduction to Pandas Styling

A pandas dataframe is a tabular structure with rows and columns. One of the most popular environments for performing data-related tasks is Jupyter notebooks. These are web-based platform-independent IDEs. In Jupyter notebooks, the dataframe is rendered for display using HTML tags and CSS. This means that you can manipulate the styling of these web components.

We will see this in action in upcoming sections. For now, let’s create a sample dataset and display the output dataframe.

import pandas as pd
import numpy as np
np.random.seed(88)
df = pd.DataFrame({'A': np.linspace(1, 10, 10)})
df = pd.concat([df, pd.DataFrame(np.random.randn(10, 4), columns=list('BCDE'))],
               axis=1)
df.iloc[3, 3] = np.nan
df.iloc[0, 2] = np.nan
Pandas DataFrame Styling df

Output Dataframe (Without Styling)

Doesn’t this look boring to you? What if you transform this minimal table to this:

Pandas DataFrame Styling null

The transformed table above has:

  1. Maximum values marked yellow for each column
  2. Null values marked red for each column
  3. More appealing table style, better fonts for header, and increased font size.

Now, we will be exploring all the possible ways of styling the dataframe and making it similar to what you saw above, so let’s begin!

Styling the DataFrame

To access all the styling properties for the pandas dataframe, you need to use the accessor (Assume that dataframe object has been stored in variable “df”):

df.style

This accessor helps in the modification of the styler object (df.style), which controls the display of the dataframe on the web. Let’s look at some of the methods to style the dataframe.

1. Highlight Min-Max values

The dataframes can take a large number of values but when it is of a smaller size, then it makes sense to print out all the values of the dataframe. Now, you might be doing some type of analysis and you wanted to highlight the extreme values of the data. For this purpose, you can add style to your dataframe that highlights these extreme values.

1.1 For highlighting maximum values: Chain “.highlight_max()” function to the styler object. Additionally, you can also specify the axis for which you want to highlight the values. (axis=1: Rows, axis=0: Columns – default).

df.style.highlight_max()
Pandas DataFrame Styling highlight max

1.2 For highlighting minimum values: Chain “.highlight_min()” function to the styler object. Here also, you can specify the axis at which these values will be highlighted.

df.style.highlight_min()
Pandas DataFrame Styling highlight max 2

Both Min-Max highlight functions support the parameter “color” to change the highlight color from yellow.

2. Highlight Null values

Every dataset has some or the other null/missing values. These values should be either removed or handled in such a way that it doesn’t introduce any biasness. To highlight such values, you can chain the “.highlight_null()” function to the styler object. This function doesn’t support the axis parameter and the color control parameter here is “null_color” which takes the default value as “red”

df.style.highlight_null(null_color="green")
Pandas DataFrame Styling null

set_na_rep(): Along with highlighting the missing values, they may be represented as “nan”. You can change the representation of these missing values using the set_na_rep() function. This function can also be chained with any styler function but chaining it with highlight_null will provide more details.

df.style.set_na_rep("OutofScope").highlight_null(null_color="orange")
Pandas DataFrame Styling null

3. Create Heatmap within dataframe

Heatmaps are used to represent values with the color shades. The higher is the color shade, the larger is the value present. These color shades represent the intensity of values as compared to other values. To plot such a mapping in the dataframe itself, there is no direct function but the “styler.background_gradient()” workaround does the work.

df.style.background_gradient()
Pandas DataFrame Styling gradient

There are few parameters you can pass to this function to further customize the output generated:

  1. cmap: By default, the “PuBu” colormap is selected by pandas You can create a custom matplotlib colormap and pass it to the camp parameter.
  2. axis: Generating heat plot via rows or columns criteria, by default: columns
  3. text_color_threshold: Controls text visibility across varying background colors.

4. Table Properties

As mentioned earlier also, the dataframe presented in the Jupyter notebooks is a table rendered using HTML and CSS. The table properties can be controlled using the “set_properties” method. This method is used to set one or more data-independent properties.

This means that the modifications are done purely based on visual appearance and no significance as such. This method takes in the properties to be set as a dictionary.

Example: Making table borders green with text color as purple. 

df.style.set_properties(**{'border': '1.3px solid green',
                          'color': 'magenta'})
Table Properties

5. Create Bar charts

Just as the heatmap, the bar charts can also be plotted within the dataframe itself. The bars are plotted in each cell depending upon the axis selected. By default, the axis=0 and the plot color are also fixed by pandas but it is configurable. To plot these bars, you simply need to chain the “.bar()” function to the styler object.

df.style.bar()
Create Bar charts

6. Control precision

The current values of the dataframe have float values and their decimals have no boundary condition. Even the column “A”, which had to hold a single value is having too many decimal places. To control this behavior, you can use the “.set_precision()”  function and pass the value for maximum decimals to be allowed.

df.style.set_precision(2)
Control precision

Now the dataframe looks clean.

7. Add Captions

Like every image has a caption that defines the post text, you can add captions to your dataframes. This text will depict what the dataframe results talk about. They may be some sort of summary statistics like pivot tables.

df.style.set_caption("This is Analytics Vidhya Blog").set_precision(2).background_gradient()
Add Captions

(Here, different methods have been changed along with the caption method)

8. Hiding Index or Column

As the title suggests, you can hide the index or any particular column from the dataframe. Hiding index from the dataframe can be useful in cases when the index doesn’t convey anything significant about the data. The column hiding depends on whether it is useful or not.

df.style.hide_index()
Hiding Index or Column

9. Control display values

Using the styler object’s “.format()” function, you can distinguish between the actual values held by the dataframe and the values you present. The “format” function takes in the format spec string that defines how individual values are presented.

You can directly specify the specification which will apply to the whole dataset or you can pass the specific column on which you want to control the display values.

df.style.format("{:.3%}")
Control display values 

You may notice that the missing values have also been marked by the format function. This can be skipped and substituted with a different value using the “na_rep” (na replacement) parameter.

df.style.format("{:.3%}", na_rep="&&")
 Control display values 2

Create your Own Styling Method

Although you have many methods to style your dataframe, it might be the case that your requirements are different and you need a custom styling function for your analysis. You can create your function and use it with the styler object in two ways:

  1. apply function: When you chain the “apply” function to the styler object, it sends out the entire row (series) or the dataframe depending upon the axis selected. Hence, if you make your function work with the “apply” function, it should return the series or dataframe with the same shape and CSS attribute-value pair.
  2. apply map function: This function sends out scaler values (or element-wise) and therefore, your function should return a scaler only with CSS attribute-value pair.

Let’s implement both types:

Target: apply function

def highlight_mean_greater(s):
    '''
    highlight yellow is value is greater than mean else red.
    '''
    is_max = s > s.mean()
    return ['background-color: yellow' if i else 'background-color: red' for i in is_max]
df.style.apply(highlight_mean_greater)
highlight_mean_greater

Target: apply map function

def color_negative_red(val):
    """
    Takes a scalar and returns a string with
    the css property `'color: red'` for negative
    strings, black otherwise.
    """
    color = 'red' if val < 0 else 'black'
    return 'color: %s' % color
df.style.apply(color_negative_red)
(color_negative_red)

Table Styles

These are styles that apply to the table as a whole, but don’t look at the data. It is very similar to the set_properties function but here, in the table styles, you can customize all web elements more easily.

The function of concern here is the “set_table_styles” that takes in the list of dictionaries for defining the elements. The dictionary needs to have the selector (HTML tag or CSS class) and its corresponding props (attributes or properties of the element). The props need to be a list of tuples of properties for that selector.

The images shown in the beginning, the transformed table has the following style:

styles = [
    dict(selector="tr:hover",
                props=[("background", "#f4f4f4")]),
    dict(selector="th", props=[("color", "#fff"),
                               ("border", "1px solid #eee"),
                               ("padding", "12px 35px"),
                               ("border-collapse", "collapse"),
                               ("background", "#00cccc"),
                               ("text-transform", "uppercase"),
                               ("font-size", "18px")
                               ]),
    dict(selector="td", props=[("color", "#999"),
                               ("border", "1px solid #eee"),
                               ("padding", "12px 35px"),
                               ("border-collapse", "collapse"),
                               ("font-size", "15px")
                               ]),
    dict(selector="table", props=[
                                    ("font-family" , 'Arial'),
                                    ("margin" , "25px auto"),
                                    ("border-collapse" , "collapse"),
                                    ("border" , "1px solid #eee"),
                                    ("border-bottom" , "2px solid #00cccc"),                                    
                                      ]),
    dict(selector="caption", props=[("caption-side", "bottom")])
]

And the required methods which created the final table:

df.style.set_table_styles(styles).set_caption("Image by Author (Made in Pandas)").highlight_max().highlight_null(null_color='red')

Export to Excel

You can store all the styling you have done on your dataframe in an excel file. The “.to_excel” function on the styler object makes it possible. The function needs two parameters: the name of the file to be saved (with extension XLSX) and the “engine” parameter should be “openpyxl”.

df.style.set_precision(2).background_gradient().hide_index().to_excel('styled.xlsx', engine='openpyxl')
Export to Excel

Conclusion

In this detailed article, we saw all the built-in methods to style the dataframe. Then we looked at how to create custom styling functions and then we saw how to customize the dataframe by modifying it at HTML and CSS level. We also saw how to save our styled dataframe into excel files.

If you have any doubts, queries, or potential opportunities, then you can reach out to me via

1. Linkedin – in/kaustubh-gupta/

2. Twitter – @Kaustubh1828

3. GitHub – kaustubhgupta

4. Medium – @kaustubhgupta1828

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.
You can also read this article on our Mobile APP Get it on Google Play