Have you ever been stuck in a situation where you have a huge dataset and you wanted insights from it? Sounds scary, right? Getting useful insights, especially from a huge dataset, is a tall order. Imagine transforming your dataset into an interactive web application without any frontend expertise for data visualization. Gradio, when used alongside Python, offers this functionality with minimal coding. Data visualization is a powerful tool to present data insights effectively. In this guide, we will explore how to build modern, interactive data dashboards, with an emphasis on Gradio data visualization and demonstrating how to build a GUI using Python. Let’s start.
Gradio is an open-source Python library for building web-based interfaces. It is especially built for simplifying the development of user interfaces for deploying Machine learning models and data applications. You don’t need to have an extensive background in web technologies like HTML, JavaScript, and CSS. Gradio takes care of all complexities and other things internally. This allows you to focus on just the Python code.
Streamlit and Gradio both allow the development of Web applications with minimal lines of code. They are both completely different from each other. Hence, understanding their differences can help you select the right framework for building web applications.
Aspect | Gradio | Streamlit |
Ease of Use | Gradio is very easy to use and is often appreciated for its simplicity. Beginners find Gradio easy to start with. | Streamlit offers a large number of features and customization, which might have a steep learning curve. |
Primary Focus | The primary focus of Gradio is to create the interfaces for machine learning or artificial intelligence models. | Streamlit is more like a general-purpose framework for broader tasks. |
Reactive Model | Gradio components often update upon a specific action, like a button click, though live updates can be configured. | Streamlit employs a reactive model. Any input change typically reruns the entire script. |
Strengths | Gradio is excellent for quickly showcasing models or building simpler Gradio data visualization tools. | Streamlit is strong for data-centric apps and detailed interactive data dashboards. |
Both tools can be utilized to make Interactive dashboards. The choice of one depends on the specific needs of the project.
Read more: Gradio vs StreamLit detailed comparison
Let’s look at the crucial steps required for building this interactive dashboard.
One of the crucial steps before creating the dashboard is having the underlying data that will be used for visualization. Our data for the Python Gradio dashboard will be a synthetic CSV file. It contains 100,000 records simulating website user engagement. Each record represents a user session or significant interaction.
Here’s a sample of what our CSV will look like:
timestamp | user_id | page_visited | session_duration_seconds | country | device_type | browser |
2023-01-15 10:30:00 | U1001 | /home | 120 | USA | Desktop | Chrome |
2023-01-15 10:32:00 | U1002 | /products | 180 | Canada | Mobile | Safari |
2023-01-15 10:35:00 | U1001 | /contact | 90 | USA | Desktop | Chrome |
… | … | … | … | … | … | … |
You can use the following Python code to generate this type of data. Here we are generating one for demonstration purposes. Ensure that you have numpy and pandas installed.
import numpy as np
import pandas as pd
from datetime import datetime, timedelta
def generate_website_data(nrows: int, filename: str):
# Possible values for categorical fields
pages = ["/home", "/products", "/services", "/about", "/contact", "/blog"]
countries = ["USA", "Canada", "UK", "Germany", "France", "India", "Australia"]
device_types = ["Desktop", "Mobile", "Tablet"]
browsers = ["Chrome", "Firefox", "Safari", "Edge", "Opera"]
# Generate random data
user_ids = [f"User_{i}" for i in np.random.randint(1000, 2000, size=nrows)]
page_visited_data = np.random.choice(pages, size=nrows)
session_durations = np.random.randint(30, 1800, size=nrows) # Session duration between 30s and 30min
country_data = np.random.choice(countries, size=nrows)
device_type_data = np.random.choice(device_types, size=nrows)
browser_data = np.random.choice(browsers, size=nrows)
# Generate random timestamps over the last two years
end_t = datetime.now()
start_t = end_t - timedelta(days=730)
time_range_seconds = int((end_t - start_t).total_seconds())
timestamps_data = []
for _ in range(nrows):
random_seconds = np.random.randint(0, time_range_seconds)
timestamp = start_t + timedelta(seconds=random_seconds)
timestamps_data.append(timestamp.strftime('%Y-%m-%d %H:%M:%S'))
# Define columns for the DataFrame
columns = {
"timestamp": timestamps_data,
"user_id": user_ids,
"page_visited": page_visited_data,
"session_duration_seconds": session_durations,
"country": country_data,
"device_type": device_type_data,
"browser": browser_data,
}
# Create Pandas DataFrame
df = pd.DataFrame(columns)
# Sort by timestamp
df['timestamp'] = pd.to_datetime(df['timestamp'])
df = df.sort_values(by="timestamp").reset_index(drop=True)
# Write to CSV
df.to_csv(filename, index=False)
print(f"{nrows} rows of data generated and saved to {filename}")
# Generate 100,000 rows of data
generate_website_data(100_000, "website_engagement_data.csv")
# print("Please uncomment the above line to generate the data.")
Output:
100000 rows of data generated and saved to website_engagement_data.csv
After executing this code, you will see an output, and a CSV file containing the data will be generated.
The installation of Gradio is very straightforward using pip. It‘s recommended to use a dedicated Python environment. Tools like venv and conda can be used to create an isolated environment. Gradio requires Python 3.8 or a newer version.
python -m venv gradio_env
source gradio_env/bin/activate # On Linux/macOS
.\gradio_env\Scripts\activate # On Windows
Installing the necessary libraries
pip install gradio pandas plotly cachetools
Now we have installed all the dependencies, let’s create the dashboard step by step.
First, create an app.py file, then import the necessary libraries for building the interactive dashboard. We will use Plotly for Gradio data visualization. And Cachetools for creating a cache for expensive function calls to improve performance.
import gradio as gr
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from datetime import datetime, date
from cachetools import cached, TTLCache
import warnings
warnings.filterwarnings("ignore", category=FutureWarning, module="plotly")
warnings.filterwarnings("ignore", category=UserWarning, module="plotly")
Let’s load the generated CSV file. Make sure that the CSV file is within the same directory as your app.py.
# --- Load CSV data ---
DATA_FILE = "website_engagement_data.csv" # Make sure this file is generated and in the same directory or provide full path
raw_data = None
def load_engagement_data():
global raw_data
try:
# Generate data if it doesn't exist (for first-time run)
import os
if not os.path.exists(DATA_FILE):
print(f"{DATA_FILE} not found. Generating synthetic data...")
print(f"Please generate '{DATA_FILE}' using the provided script first if it's missing.")
return pd.DataFrame()
dtype_spec = {
'user_id': 'string',
'page_visited': 'category',
'session_duration_seconds': 'int32',
'country': 'category',
'device_type': 'category',
'browser': 'category'
}
raw_data = pd.read_csv(
DATA_FILE,
parse_dates=["timestamp"],
dtype=dtype_spec,
low_memory=False
)
# Ensure timestamp is datetime
raw_data['timestamp'] = pd.to_datetime(raw_data['timestamp'])
print(f"Data loaded successfully: {len(raw_data)} rows.")
except FileNotFoundError:
print(f"Error: The file {DATA_FILE} was not found.")
raw_data = pd.DataFrame() # Return empty dataframe if file not found
except Exception as e:
print(f"An error occurred while loading data: {e}")
raw_data = pd.DataFrame()
return raw_data
# Load data at script startup
load_engagement_data()
These functions are used to create a cache for the fast loading of data, which will reduce the calculation time.
# Caching and Utility Functions ---
# Cache for expensive function calls to improve performance
ttl_cache = TTLCache(maxsize=100, ttl=300) # Cache up to 100 items, expire after 5 minutes
@cached(ttl_cache)
def get_unique_filter_values():
if raw_data is None or raw_data.empty:
return [], [], []
pages = sorted(raw_data['page_visited'].dropna().unique().tolist())
devices = sorted(raw_data['device_type'].dropna().unique().tolist())
countries = sorted(raw_data['country'].dropna().unique().tolist())
return pages, devices, countries
def get_date_range_from_data():
if raw_data is None or raw_data.empty:
return date.today(), date.today()
min_dt = raw_data['timestamp'].min().date()
max_dt = raw_data['timestamp'].max().date()
return min_dt, max_dt
The following function will be used to filter the data based on the user’s input or actions on the dashboard.
# Data Filtering Function ---
def filter_engagement_data(start_date_dt, end_date_dt, selected_page, selected_device, selected_country):
global raw_data
if raw_data is None or raw_data.empty:
return pd.DataFrame()
# Ensure dates are datetime.date objects if they are strings
if isinstance(start_date_dt, str):
start_date_dt = datetime.strptime(start_date_dt, '%Y-%m-%d').date()
if isinstance(end_date_dt, str):
end_date_dt = datetime.strptime(end_date_dt, '%Y-%m-%d').date()
# Convert dates to datetime for comparison with timestamp column
start_datetime = datetime.combine(start_date_dt, datetime.min.time())
end_datetime = datetime.combine(end_date_dt, datetime.max.time())
filtered_df = raw_data[
(raw_data['timestamp'] >= start_datetime) &
(raw_data['timestamp'] <= end_datetime)
].copy()
if selected_page != "All Pages" and selected_page is not None:
filtered_df = filtered_df[filtered_df['page_visited'] == selected_page]
if selected_device != "All Devices" and selected_device is not None:
filtered_df = filtered_df[filtered_df['device_type'] == selected_device]
if selected_country != "All Countries" and selected_country is not None:
filtered_df = filtered_df[filtered_df['country'] == selected_country]
return filtered_df
The next function will be used to calculate the Key metrics like total sessions, unique users, and top page by number of visitors.
#Function to Calculate Key Metrics ---
@cached(ttl_cache)
def calculate_key_metrics(start_date_dt, end_date_dt, page, device, country):
df = filter_engagement_data(start_date_dt, end_date_dt, page, device, country)
if df.empty:
return 0, 0, 0, "N/A"
total_sessions = df['user_id'].count() # Assuming each row is a session/interaction
unique_users = df['user_id'].nunique()
avg_session_duration = df['session_duration_seconds'].mean()
if pd.isna(avg_session_duration): # Handle case where mean is NaN (e.g., no sessions)
avg_session_duration = 0
# Top page by number of visits
if not df['page_visited'].mode().empty:
top_page_visited = df['page_visited'].mode()[0]
else:
top_page_visited = "N/A"
return total_sessions, unique_users, round(avg_session_duration, 2), top_page_visited
Now we will create some graph plotting functions using Plotly. It will make our dashboard look more detailed and engaging.
# Functions for Plotting with Plotly ---
def create_sessions_over_time_plot(start_date_dt, end_date_dt, page, device, country):
df = filter_engagement_data(start_date_dt, end_date_dt, page, device, country)
if df.empty:
fig = go.Figure().update_layout(title_text="No data for selected filters", xaxis_showgrid=False, yaxis_showgrid=False)
return fig
sessions_by_date = df.groupby(df['timestamp'].dt.date)['user_id'].count().reset_index()
sessions_by_date.rename(columns={'timestamp': 'date', 'user_id': 'sessions'}, inplace=True)
fig = px.line(sessions_by_date, x='date', y='sessions', title='User Sessions Over Time')
fig.update_layout(margin=dict(l=20, r=20, t=40, b=20))
return fig
def create_engagement_by_device_plot(start_date_dt, end_date_dt, page, device, country):
df = filter_engagement_data(start_date_dt, end_date_dt, page, device, country)
if df.empty:
fig = go.Figure().update_layout(title_text="No data for selected filters", xaxis_showgrid=False, yaxis_showgrid=False)
return fig
device_engagement = df.groupby('device_type')['session_duration_seconds'].sum().reset_index()
device_engagement.rename(columns={'session_duration_seconds': 'total_duration'}, inplace=True)
fig = px.bar(device_engagement, x='device_type', y='total_duration',
title='Total Session Duration by Device Type', color='device_type')
fig.update_layout(margin=dict(l=20, r=20, t=40, b=20))
return fig
def create_page_visits_distribution_plot(start_date_dt, end_date_dt, page, device, country):
df = filter_engagement_data(start_date_dt, end_date_dt, page, device, country)
if df.empty:
fig = go.Figure().update_layout(title_text="No data for selected filters", xaxis_showgrid=False, yaxis_showgrid=False)
return fig
page_visits = df['page_visited'].value_counts().reset_index()
page_visits.columns = ['page_visited', 'visits']
fig = px.pie(page_visits, names='page_visited', values='visits',
title='Distribution of Page Visits', hole=0.3)
fig.update_layout(margin=dict(l=20, r=20, t=40, b=20))
return fig
The functions below are used to prepare the data for tabular display and update the dashboard values after any functions or input by the user.
# Function to Prepare Data for Table Display ---
def get_data_for_table_display(start_date_dt, end_date_dt, page, device, country):
df = filter_engagement_data(start_date_dt, end_date_dt, page, device, country)
if df.empty:
return pd.DataFrame(columns=['timestamp', 'user_id', 'page_visited', 'session_duration_seconds', 'country', 'device_type', 'browser'])
# Select and order columns for display
display_columns = ['timestamp', 'user_id', 'page_visited', 'session_duration_seconds', 'country', 'device_type', 'browser']
df_display = df[display_columns].copy()
df_display['timestamp'] = df_display['timestamp'].dt.strftime('%Y-%m-%d %H:%M:%S') # Format date for display
return df_display.head(100) # Display top 100 rows for performance
#Main Update Function for the Dashboard ---
def update_full_dashboard(start_date_str, end_date_str, selected_page, selected_device, selected_country):
if raw_data is None or raw_data.empty: # Handle case where data loading failed
empty_fig = go.Figure().update_layout(title_text="Data not loaded", xaxis_showgrid=False, yaxis_showgrid=False)
empty_df = pd.DataFrame()
return empty_fig, empty_fig, empty_fig, empty_df, 0, 0, 0.0, "N/A"
# Convert date strings from Gradio input to datetime.date objects
start_date_obj = datetime.strptime(start_date_str, '%Y-%m-%d').date() if isinstance(start_date_str, str) else start_date_str
end_date_obj = datetime.strptime(end_date_str, '%Y-%m-%d').date() if isinstance(end_date_str, str) else end_date_str
# Get key metrics
sessions, users, avg_duration, top_page = calculate_key_metrics(
start_date_obj, end_date_obj, selected_page, selected_device, selected_country
)
# Generate plots
plot_sessions_time = create_sessions_over_time_plot(
start_date_obj, end_date_obj, selected_page, selected_device, selected_country
)
plot_engagement_device = create_engagement_by_device_plot(
start_date_obj, end_date_obj, selected_page, selected_device, selected_country
)
plot_page_visits = create_page_visits_distribution_plot(
start_date_obj, end_date_obj, selected_page, selected_device, selected_country
)
# Get data for table
table_df = get_data_for_table_display(
start_date_obj, end_date_obj, selected_page, selected_device, selected_country
)
return (
plot_sessions_time,
plot_engagement_device,
plot_page_visits,
table_df,
sessions,
users,
avg_duration,
top_page
)
Finally, we are going to create the Gradio interface utilizing all the utility functions that we created above.
# Create Gradio Dashboard Interface ---
def build_engagement_dashboard():
unique_pages, unique_devices, unique_countries = get_unique_filter_values()
min_data_date, max_data_date = get_date_range_from_data()
# Set initial dates as strings for Gradio components
initial_start_date_str = min_data_date.strftime('%Y-%m-%d')
initial_end_date_str = max_data_date.strftime('%Y-%m-%d')
with gr.Blocks(theme=gr.themes.Soft(), title="Website Engagement Dashboard") as dashboard_interface:
gr.Markdown("# Website User Engagement Dashboard")
gr.Markdown("Explore user activity trends and engagement metrics for your website. This **Python Gradio dashboard** helps with **Gradio data visualization**.")
# --- Filters Row ---
with gr.Row():
start_date_picker = gr.Textbox(label="Start Date (YYYY-MM-DD)", value=initial_start_date_str, type="text")
end_date_picker = gr.Textbox(label="End Date (YYYY-MM-DD)", value=initial_end_date_str, type="text")
with gr.Row():
page_dropdown = gr.Dropdown(choices=["All Pages"] + unique_pages, label="Page Visited", value="All Pages")
device_dropdown = gr.Dropdown(choices=["All Devices"] + unique_devices, label="Device Type", value="All Devices")
country_dropdown = gr.Dropdown(choices=["All Countries"] + unique_countries, label="Country", value="All Countries")
# --- Key Metrics Display ---
gr.Markdown("## Key Metrics")
with gr.Row():
total_sessions_num = gr.Number(label="Total Sessions", value=0, precision=0)
unique_users_num = gr.Number(label="Unique Users", value=0, precision=0)
avg_duration_num = gr.Number(label="Avg. Session Duration (s)", value=0, precision=2)
top_page_text = gr.Textbox(label="Most Visited Page", value="N/A", interactive=False)
# --- Visualizations Tabs ---
gr.Markdown("## Visualizations")
with gr.Tabs():
with gr.TabItem("Sessions Over Time"):
sessions_plot_output = gr.Plot()
with gr.TabItem("Engagement by Device"):
device_plot_output = gr.Plot()
with gr.TabItem("Page Visit Distribution"):
page_visits_plot_output = gr.Plot()
# --- Raw Data Table ---
gr.Markdown("## Raw Engagement Data (Sample)")
# Corrected: Removed max_rows. The number of rows displayed will be controlled
# by the DataFrame returned by get_data_for_table_display (which returns head(100)).
# Gradio will then paginate or scroll this.
data_table_output = gr.DataFrame(
label="User Sessions Data",
interactive=False,
headers=['Timestamp', 'User ID', 'Page Visited', 'Duration (s)', 'Country', 'Device', 'Browser']
# For display height, you can use the `height` parameter, e.g., height=400
)
# --- Define Inputs & Outputs for Update Function ---
inputs_list = [start_date_picker, end_date_picker, page_dropdown, device_dropdown, country_dropdown]
outputs_list = [
sessions_plot_output, device_plot_output, page_visits_plot_output,
data_table_output,
total_sessions_num, unique_users_num, avg_duration_num, top_page_text
]
# --- Event Handling: Update dashboard when filters change ---
for filter_component in inputs_list:
if isinstance(filter_component, gr.Textbox):
filter_component.submit(fn=update_full_dashboard, inputs=inputs_list, outputs=outputs_list)
else:
filter_component.change(fn=update_full_dashboard, inputs=inputs_list, outputs=outputs_list)
# --- Initial load of the dashboard ---
dashboard_interface.load(
fn=update_full_dashboard,
inputs=inputs_list,
outputs=outputs_list
)
return dashboard_interface
Here we are executing the main function, build_engagement_dashboard, which will prepare the interface for the launch of the web application.
# --- Main execution block ---
if __name__ == "__main__":
if raw_data is None or raw_data.empty:
print("Halting: Data could not be loaded. Please ensure 'website_engagement_data.csv' exists or can be generated.")
else:
print("Building and launching the Gradio dashboard...")
engagement_dashboard = build_engagement_dashboard()
engagement_dashboard.launch(server_name="0.0.0.0") # Makes it accessible on local network
print("Dashboard is running. Open your browser to the provided URL.")
Now, run the Python app.py in the terminal to run the web application.
Output:
Click on the local URL link to launch the Gradio interface.
Output:
An interactive dashboard has been created. We can use this interface to analyse our dataset and draw insights from it easily that too in an interactive way.
We can see the visualizations based on different filters.
Gradio can be utilized effectively to draw insights from a massive dataset. By creating an interactive visualization dashboard, the process of data analysis can be done engagingly. If you have finished this detailed guide, then you’d be able to create an interactive dashboard using Gradio efficiently. We covered data generation, loading, caching, defining the filter logic, calculating the metrics, and creating plots with Plotly. No knowledge of front-end programming and technologies was required to build this. While we used CSV in this guide, you can utilize any other data source if needed. Gradio proved to be a valuable tool for creating dynamic and user-friendly dashboards.