Maximizing Profits through Bayesian Demand Forecasting

Lucas Nogueira De Sousa 20 Mar, 2023 • 5 min read

Introduction

Demand forecasting helps companies determine the necessary quantity of products to produce, among others things. Bayesian Learning is one of the existing techniques that can help to accomplish this task.

Bayesian Learning is a machine learning tool that evaluates the veracity of hypothetical data, which may be initial beliefs about a given parameter, based on historical data in order to predict future events. In addition to demand forecasting, it can be used in other areas, such as document classification and weather forecasting, among others.

Bayesian
Learning Objectives:

In this article, we will

Use Bayesian learning, making use of the PyMC library to forecast future demands.
We will create a model using the PyMC library.
We will analyze the results of model creation by plotting them.
After training the model, we will forecast its future demands.

This article was published as a part of the Data Science Blogathon.

Data Preparation

After importing the necessary packages, the first step is to prepare the dataset. We will create a synthetic sales dataset containing a field of sold quantity and a sales date column in the index, following a normal distribution, using the pandas and numpy libraries. The code for creating this dataset is shown below:

#importing packages
import pandas as pd
import numpy as np
import seaborn as sns

#Create dates
date_range = pd.date_range(start='2022-05-01', end='2022-12-31', freq='D')

#Generate daily sales with a normal distribution
mean = 50 # sales mean
std = 10 # sales standard deviation
sales = pd.Series(np.random.normal(loc=mean, scale=std, size=len(date_range)), index=date_range)

#Create a DataFrame with the daily sales
sales_data = pd.DataFrame({'sales': sales})

#Display the first 5 rows of the DataFrame
sales_data.head()

Hit run to see the output:

As can be seen from the histogram below, the data follows a normal distribution.

Model Creation

To create the model, we will use PyMC, which is a Python module that enables the application of Bayesian inference. Bayesian inference is a statistical method used in Bayesian learning that seeks to estimate the probability distribution of an unknown parameter based on observed data and a probability model that establishes a relationship between the data and the parameter. PyMC will allow us to create a normally distributed model similar to our dataset’s ‘sales’ variable.

For information purposes, Bayesian learning is a universe of which Bayesian inference is a part, and meanings cannot be confused.

Before observing the sales of the data set to know the parameters of mean and standard deviation (parameters used for the creation of the normal distribution), it is possible to already have a prior knowledge of these parameters, which can be obtained, for example, through a business expert. This prior knowledge is called Priori. It is important to know that priori knowledge is uncertain, as we may not have sufficiently complete information on the parameters before observing the data. Furthermore, even if we have this information, it may be incorrect.

Therefore, we need to consider this uncertainty when using a priori knowledge. The choice of Priori is extremely important in Bayesian inference for the acquisition of good results since it will influence the posterior distribution to the point that the posterior parameters are more influenced by Priori than by the parameters of the data set if they are close values of the observed data. The posterior distribution is how the probability distribution will look after we update the a priori, taking into account the knowledge obtained through the observed data.

In the given an example, the prior for the mean is defined with the mean equal to 80 and the standard deviation equal to 20, which could mean that someone with great knowledge and experience in the area under analysis, for example, believes that these should be the parameter values of the data.

To perform Bayesian inference, it is also necessary to specify the likelihood of the data and sample from the distributions to obtain the posterior distribution of the parameters. The likelihood is the probability distribution of the observed sales data given the mean and standard deviation of the normal distribution in the model. Sampling must be done carefully to ensure that the collected samples are representative of the posterior distribution. Be careful when choosing the sample size, as very large samples can make the process very time-consuming, and very small samples may not be relevant for making predictions. In our example, 5,000 samples will be generated, performing 1,000 fitting iterations first in order to avoid low-probability samples.

Below is the code for creating the model:

import pymc as pm
#Create the model
with pm.Model() as model:
  # Define the prior for the mean
  mu = pm.Normal('mu', 80, 20)
  # Define the prior for the standard deviation
  sigma = pm.HalfNormal('sigma', 10)

  # Define the likelihood
  sales = pm.Normal('sales', mu, sigma, observed=sales_data)

  # Sample
  trace = pm.sample(5000, tune=1000)

Results Analysis

After generating the sampling, we will analyze the results by plotting them. This diagnosis will be done to verify the quality of the sampling.

import arviz as az

#Analyze the results
with model:
  az.plot_trace(trace)

  #Plot the posterior histogram of the mean
  az.plot_posterior(trace)

#Summary
pm.summary(trace)

As can be seen, the prior was defined with values that are very far from the mean and standard deviation of the observed data, which means that incorrect prior information was introduced. Remember that the prior may have a greater or lesser influence depending on the amount of observed data, the quality of the data, and the discrepancy between the prior and the observed data. Therefore, it should be chosen carefully.

Forecasting Future Demand

Ok! The model has been trained, and now we can use it to make predictions. To do this, we need to generate a sample from the posterior distribution for the mean and standard deviation, using a normal distribution with the mean equal to the mean of the posterior samples for the mean and the standard deviation equal to the mean of the posterior samples for the standard deviation. The forecast period is 90 days.

#Generate samples from the posterior
mu_samples = trace['posterior']['mu']
sigma_samples = trace['posterior']['sigma']

#Predict future demand
future_sales = np.random.normal(mu_samples.mean(), sigma_samples.mean(), size=90)

#Plot the predicted future demand
import matplotlib.pyplot as plt
plt.plot(future_sales)
plt.title('Forecasting Future Demand')
plt.xlabel('Day')
plt.ylabel('Sales')
plt.show()

Conclusion

Forecasting demand is crucial for determining the number of products to be produced, avoiding waste in resource acquisition and maximizing profits. By making use of techniques such as Bayesian Inference, we will be able to forecast demands much more accurately than using only the prior knowledge of business experts.

In this article, three takeaways can be highlighted:

The importance of remaining competitive in the market, making use of innovative actions such as forecasting demands.
The importance of using Bayesian learning to perform demand forecasts.
And finally, the careful selection of prior in Bayesian inference, is essential for obtaining more accurate results in demand forecasting.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.