Implementing Demand Based Hotel Room Pricing in Data Science Using MLOps

Ashish Kumar 17 Nov, 2023

12 min read

Introduction

During Covid, the hospitality industry has suffered a massive drop in revenue. So when people are traveling more, getting the customer remains a challenge. We will develop an ML tool to solve this problem to counter this problem and set the fitting room to attract more customers. Using the hotel’s dataset, we will build an AI tool to select the correct room price, increase the occupancy rate, and increase the hotel revenue.

Learning Objectives

Importance of setting the correct price for hotel rooms.
Cleaning Data, transforming datasets, and preprocessing datasets.
Creating maps and visual plots using hotel booking data
Real-world application of hotel booking data analysis used in data science.
Performing hotel booking data analysis using the Python programming language

This article was published as a part of the Data Science Blogathon.

What is the Hotel Room Price Dataset?
What is Hotel Room Price Analysis?
Importance of Setting the Right Hotel Room Price
Collecting Data and Preprocessing
Use Cases and Applications of Hotel Room Data Analysis in Data Science
Challenges in Hotel Room Data Analysis
Best Practices in Hotel Room Data Analysis
Future Trends and Advancements in Hotel Room Data Analysis in Data Science
Hotel Room Data Analysis with Python Implementation
Visualizations of the Datasets
Working Descriptions
What is ZenML?
Results
Demo Application
Frequently Asked Questions

What is the Hotel Room Price Dataset?

The hotel booking dataset contains data from different sources, which includes columns such as hotel type, number of adults, stay time, special requirements, etc. These values can help predict the hotel room price and help in increasing hotel revenue.

What is Hotel Room Price Analysis?

In Hotel room price analysis, we will analyze the dataset’s pattern and trend. Using this information, we will make decisions related to pricing and operation. These things will depend upon several factors.

Seasonality: Room prices rise significantly during peak seasons, such as holidays.
Demand: Room price rises when the demand is high, such as during an event celebration or a sports event.
Competition: Hotel room prices are highly influenced by nearby hotels’ prices. If the number of hotels in an area then the room price will reduce.
Amenities: If the hotel has a pool, spa, and gym, it will charge more for these facilities
Location: The hotel in the main town can charge compared to the hotel in a remote area.

Importance of Setting the Right Hotel Room Price

Setting the room price is essential to increase revenue and profit. The importance of setting the right hotel price is as follows:

Maximize revenue: Hotel price is the primary key to increasing revenue. By setting the competitive price, hotels can increase revenue.
Increase Customer: More guests would book the hotel when the room prices are fair. This helps in increasing the occupancy rate.
Maximize profit: Hotels try to charge more to increase profit. However, setting more would reduce the number of guests, whereas having the right price would increase the number.

Collecting Data and Preprocessing

Data collection and preprocessing is the essential part of hotel room price analysis. The data is collected from hotel websites, booking websites, and public datasets. This dataset is then converted to the required format for visualization purposes. In preprocessing, the dataset undergoes data cleaning and transformation. The new transformed dataset is used in visualization and model building.

Visualizing Dataset Using Tools and Techniques

Visualizing the dataset helps get insight and find the pattern to make a better decision. Below are the Python tools to provide better visualization.

Matplotlib: Matplotlib is one of the critical stools in Python used to create charts and graphs like bar and line charts.
Seaborn: Seaborn is another visualization tool in Python. It helps create more detailed visualization images like heat maps and violin plots.

Techniques Used to Visualize the Hotel Booking Dataset.

Box plots: This library plots the graph between the market segment and stay. It helps in understanding the customer type.
Bar charts: Using bar chat, we plot the graph between average daily revenue and months; this helps understand the more occupied months.
Count plot: We plotted the graph between the market segment and deposit type using a count plot to understand which segment hotels receive more deposits.

Use Cases and Applications of Hotel Room Data Analysis in Data Science

The hotel booking dataset has multiple use cases and applications as described below:

Customer Sentiment Analysis: Using machine learning techniques, such as customer sentiment analysis, from the customer review, managers can determine the sentiment and improve the service for a better experience.
Forecasting Occupancy Rate: From customer reviews and ratings, managers can estimate the room occupancy rate in the short term.
Business Operations: This dataset can also be used to track the inventory; this empowers the hotels to have sufficient room and material.
Food and Beverage: Data can also be used to set prices for food and beverage items to maximize revenue while still being competitive.
Performance Evaluation: This dataset also helps develop personalized suggestions for a guest’s experience. Thus improving hotel ratings.

Challenges in Hotel Room Data Analysis

Hotel room booking dates can have several challenges due to various reasons:

Data quality: As we are collecting data from multiple datasets, the quality of the dataset is compromised, and the chances of missing data, inconsistency, and inaccuracy arise.
Data privacy: The hotel collects sensitive data from the customer if these data leaks threaten the customer. So, following the data safety guidelines becomes almost a priority.
Data integration: The Hotel has multiple systems, like property management and booking websites, so integrating these systems has difficulties.
Data volume: Hotel room data can be extensive, making it challenging to manage and analyze.

Best Practices in Hotel Room Data Analysis

Best practices in hotel room data analysis:

To collect data, use property management systems, online booking platforms, and guest feedback systems.
Ensure data quality by regularly monitoring and cleaning the data.
Protect data privacy by implementing security measures and complying with data privacy regulations.
Integrate data from different systems to get a complete picture of the hotel room data.
Use machine learning techniques such as LSTM to forecast room rates.
Use data analytics to optimize business operations, like inventory and staffing.
Use data analytics to target marketing campaigns to attract more guests.
Use data analytics to evaluate performance and provide innovative guest experiences.
With the help of data analytics, management can better understand their customer and provide better service.

Future Trends and Advancements in Hotel Room Data Analysis in Data Science

As consumer spending increases, it greatly benefits the hotel & tourism industry. This creates new trends and data to analyze customer spending and behavior. The increase in AI tools creates an opportunity to explore and maximize the industry. With the help of an AI tool, we can gather the required data and remove unwanted data, i.e., performing data preprocessing.

On top of this data, we can train our model to generate valuable insight and produce real-time analysis. This also helps in providing personalized experiences based on individual customers and guests. This highly benefits the hotel and the customer.

Data analysis also helps the management team to understand their customer and inventory. This will help in setting dynamic room pricing based on demand. Better inventory management helps in reducing the cost.

Hotel Room Data Analysis with Python Implementation

Let us perform a fundamental Data analysis with Python implementation on a dataset from Kaggle. To download the dataset, click here.

Data Details

Hostel Booking dataset includes information on different hotel types, such as Resort hotels and City Hotels, and Market Segmentation.

Visualizations of the Datasets

Step 1. Import Libraries and read the dataset

#Importing the Library
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder

Step 2. Importing Dataset and Inspecting Data

#Read the file and convert to dataframe
df = pd.read_csv('data\hotel_bookings.csv')

#Display the dataframe shape
df.shape
(119390, 32)

#Checking the data sample 
df.head()

#Checking the dataset info
df.info()

#Checking null values
df.isna().sum()

OUTPUT

Step 3. Visualizing the dataset

#Boxplot Distribution of Nights Spent at Hotels by Market Segment and Hotel Type
plt.figure(figsize = (15,8))
sns.boxplot(x = "market_segment", y = "stays_in_week_nights", data = df, hue = "hotel",
 palette = 'Set1')

OUTPUT

#Plotting box plot for market segment vs stay in weekend night
plt.figure(figsize=(12,5))
sns.boxplot(x = "market_segment", y = "stays_in_weekend_nights", data = df, 
hue = "hotel", palette = 'Set1');

OUTPUT

Observation

The above plots show that most groups are normally distributed, and some have high skewness. Most people tend to stay less than a week. The customers from the Aviation Segment do not seem to be staying at the resort hotels and have a relatively lower day average.

#Barplot of average daily revenue vs Month
plt.figure(figsize = (12,5))
sns.barplot(x = 'arrival_date_month', y = 'adr', data = df);

OUTPUT

Working Descriptions

In the implementation part, I will show how I used a ZenML pipeline to create a model that uses historical customer data to predict the review score for the next order or purchase. I also deployed a Streamlit
application to present the end product.

What is ZenML?

ZenML is an open-source MLOps framework that streamlines production-ready ML pipeline creations. A pipeline is a series of interconnected steps, where the output of one step serves as an input to another step, leading to the creation of a finished product. Below are reasons for selecting ZenML Pipeline:

Efficient pipeline creation
Standardization of ML workflows
Real-time data analysis

Building a model is not enough; we have to deploy the model into production and monitor the model performance over time and how it interacts with accurate world data. An end-to-end machine
learning pipeline is a series of interconnected steps where the output of one step serves as an input to another step. The entire machine learning workflow can be automated through this process, from data preparation to model training and deployment. This can help us continuously predict and confidently deploy machine learning models. This way, we can track our production-ready model. I highly suggest you refer to the ZenML document for more details.

The first pipeline we create consists of the following
steps:

ingest_data: This method will ingest the data and create a DataFrame.
clean_data: This method will clean the data and remove the unwanted columns.
model_train: This method will train and save the model using MLflow auto logging.
Evaluation: This method will evaluate the model and save the metrics – using MLflow auto logging – into the artifact store.

Model Development

As we discussed above, different steps. Now, we will focus on the coding part.

Ingest Data

class IngestData:
    """
    Ingesting data from the data_path
    """
    def __init__(self,data_path:str) -> None:
        """
        Args:
            data_path: Path an which data file is located
        """
        self.data_path = data_path

    def get_data(self):
        """
        Ingesting the data from data_path
        Returns the ingested data
        """
        logging.info(f"Ingesting data from {self.data_path}")
        return pd.read_csv(self.data_path)
    
@step
def ingest_df(data_path:str) -> pd.DataFrame:
    """"
       Ingesting data from the data_path.
       Args:
       data_path: path to the data
       Returns:
       pd.DataFrame: the ingested data 
    """
    try:
        ingest_data = IngestData(data_path)
        df = ingest_data.get_data()
        return df
    except Exception as e:
        logging.error(f"Error occur while ingesting data")
        raise e

Above, we have defined an ingest_df() method, which takes the file path as an argument and returns the dataframe. Here @step is a zenml decorator. It is used to register the function as a step in a pipeline.

Clean Data & Processing

data["agent"].fillna(data["agent"].median(),inplace=True)
data["children"].replace(np.nan,0, inplace=True)
data = data.drop(data[data['adr'] < 50].index)
data = data.drop(data[data['adr'] > 5000].index)
data["total_stay"] = data['stays_in_week_nights'] + data['stays_in_weekend_nights']            
data["total_person"] = data["adults"] + data["children"] + data["babies"]  
#Feature Engineering
le = LabelEncoder()
data['hotel'] = le.fit_transform(data['hotel'])
data['arrival_date_month'] = le.fit_transform(data['arrival_date_month'])
data['meal'] = le.fit_transform(data['meal'])
data['country'] = le.fit_transform(data['country'])
data['market_segment'] = le.fit_transform(data['market_segment'])
data['reserved_room_type'] = le.fit_transform(data['reserved_room_type'])
data['assigned_room_type'] = le.fit_transform(data['assigned_room_type'])
data['deposit_type'] = le.fit_transform(data['deposit_type'])
data['customer_type'] = le.fit_transform(data['customer_type'])

In the above code, we are removing the null values and outliers. We are merging the weeknight and weekend night stay to get the total stay days.
Then, we did label encoding to the categorical columns such as hotel, country, deposit type, etc.

Model Training

from zenml import pipeline
@pipeline(enable_cache=False)
def train_pipeline(data_path: str):
    df = ingest_df(data_path)
    X_train, X_test, y_train, y_test = clean_df(df)
    model = train_model(X_train, X_test, y_train, y_test)
    r2_score,rsme = evaluate_model(model,X_test,y_test)

We will use the zenml @pipeline decorator to define the train_pipeline() method. The train_pipeline method takes the file path as an argument. After data ingestion and splitting the data into training and test sets, the train_model() method is called. This method, train_model(), will use different algorithms such as Lightgbm, Random Forest, Xgboost, and Linear_Regression to train on the dataset.

Model Evaluation

We will use the RMSE, R2 score, and MSE of different algorithms to determine the best one. In the below code, we have defined the evaluate_model() method to use other evaluation metrics.

@step(experiment_tracker=experiment_tracker.name)
def evaluate_model(model: RegressorMixin,
                   X_test: pd.DataFrame,
                   y_test: pd.DataFrame,
                   ) -> Tuple[
                       Annotated[float, "r2_score"],
                       Annotated[float, "rmse"]
                   ]:
    """
    Evaluates the model on the ingested data.
    
    Args:
        model: RegressorMixin
        x_test: pd.DataFrame
        y_test: pd.DataFrame
    
    Returns:
        r2 r2 score,
        rmse RSME
    """
    try:
        prediction = model.predict(X_test)
        mse_class = MSE()
        mse = mse_class.calculate_scores(y_test,prediction)
        mlflow.log_metric("mse",mse)
        
        r2_class = R2()
        r2 = r2_class.calculate_scores(y_test,prediction)
        mlflow.log_metric("r2",r2)

        rmse_class = RMSE()
        rmse = rmse_class.calculate_scores(y_test,prediction)
        mlflow.log_metric("rmse",rmse)
        return r2,rmse
    except Exception as e:
        logging.error("Error in evaluating model: {}".format(e))
        raise e

Setting the Environment

Create the virtual environment using Python or Anaconda.

#Command to create virtual environment
python3 -m venv <virtual_environment_name>

You must install some Python packages in your environment using the command below.

cd zenml -project /hotel-room-booking
pip install -r requirements.txt

For running the run_deployment.py script, you will also need to install some integrations using ZenML:

zenml init
zenml integration install mlflow -y

In this project, we have created two pipelines

run_pipeline.py, a pipeline that only trains the model
run_deployment.py, a pipeline that also continuously deploys the model.

run_pipeline.py will take the file path as an argument, executing the train_pipeline() method. Below is the pictorial view of the different operations performed by run_pipeline(). This can be viewed by using the dashboard provided by Zenml.

Dashboard URL: http://127.0.0.1:8237/workspaces/default/pipelines/95881272-b1cc-46d6-9f73-7b967f28cbe1/runs/803ae9c5-dc35-4daa-a134-02bccb7d55fd/dag

Dashboard Image of Hotel Room Pricing with ZenML

run_deployment.py:- Under this file, we will execute the continuous_deployment_pipeline and inference_pipeline.

continuous_deployment_pipeline

from pipelines.deployment_pipeline import continuous_deployment_pipeline,inference_pipeline

def main(config: str,min_accuracy:float):
    mlflow_model_deployment_component = MLFlowModelDeployer.get_active_model_deployer()
    deploy = config == DEPLOY or config == DEPLOY_AND_PREDICT
    predict = config == PREDICT or config == DEPLOY_AND_PREDICT 

    if deploy:
        continuous_deployment_pipeline(
            data_path=str
            min_accuracy=min_accuracy,
            workers=3,
            timeout=60
        )
        
    df = ingest_df(data_path=data_path)
    X_train, X_test, y_train, y_test = clean_df(df)
    model = train_model(X_train, X_test, y_train, y_test)
    r2_score, rmse = evaluate_model(model,X_test,y_test)
    deployment_decision = deployment_trigger(r2_score)
    mlflow_model_deployer_step(model=model,
                               deploy_decision=deployment_decision,
                               workers=workers,
                               timeout=timeout)

In the abThede, they create a continuous deployment pipeline to take the data and perform data ingestion, splitting, and model training. Once they train the model, they will then evaluate it.

inference_pipeline

@pipeline(enable_cache=False, settings={"docker": docker_settings})
def inference_pipeline(pipeline_name: str, pipeline_step_name: str):
    # Link all the steps artifacts together
    batch_data = dynamic_importer()
    model_deployment_service = prediction_service_loader(
        pipeline_name=pipeline_name,
        pipeline_step_name=pipeline_step_name,
        running=False,
    )
    predictor(service=model_deployment_service, data=batch_data)

In inference_pipeline, we will predict once the model is trained on the training dataset. In the above code, use dynamic_importer, prediction_service_loader, and predictor. Each of these method have different functionality.

dynamic_importer:- It loads the dataset and performs preprocessing.
prediction_service_loader: – This will load the deployed model using the parameter pipeline name and step name offered by Zenml.
Predictor: – Once the model is trained, a prediction will be made on the test dataset.

Now we will visualize the pipelines using Zenml dashboard to clear view.

continuous_deployment_pipeline dashboard:-

Dashboard url:- http://127.0.0.1:8237/workspaces/default/pipelines/9eb06aba-d7df-43ef-a017-8cb5bb13cd89/runs/e4208fa5-48c8-4a8c-91f1-011c5e1ddbf9/dag

inference_pipeline dashboard:-

Dashboard url:-http://127.0.0.1:8237/workspaces/default/pipelines/07351bb1-6b0d-400e-aeea-551159346f0e/runs/c1ce61f8-dd12-4244-a4d6-514e5520b879/dag

We have deployed a Streamlit app that uses the latest model service asynchronously from the pipeline. It can be done quickly with ZenML within the Streamlit code. To run this Streamlit app in your local system, use the below command:

# command to run the streamlit app locally
streamlit run streamlit_app.py

You can get the complete end-to-end implementation code here

Results

We have experimented with multiple algorithms and compared the performance of each model. The results are as follows:

Models	MSE	RMSE	R2_Score
XGboost	267.465	16.354	16.354
LightGBM	319.477	17.873	0.839
RandomForest	14.485	209.837	0.894
Linear Regression	1338.777	36.589	0.325

The Random Forest model performs the best, with the lowest MSE and the highest R^2 score. This means that it is the most accurate at predicting the target variable and explains the most variance in the target variable. LightGBM model is the second best model, followed by the XGBoost model. The Linear Regression model performs the worst.

Demo Application

A live demo application of this project using Streamlit. It takes some input features for the product and predicts the customer satisfaction rate using our trained models.

Conclusion

The hotel room booking sector is also rapidly evolving as internet accessibility has increased in different parts of the world. Due to this, the demand for online hotel room booking has increased. Hotel management wants to know how to keep their guests and improve products and services to make better decisions. Machine learning is vital in various businesses, like customer segmentation, demand forecasting, product recommendation, guest satisfaction, etc.

Frequently Asked Questions

Q1. Which features are crucial in the hotel room price estimation dataset?

Several features determine the room price. Some of them are hotel_type, room_type, arrival_date, departure_date, number_of_guests, etc.

Q2. What is the purpose of the hotel room price estimation model?

The model aims to set the correct room price so the hotels can keep the occupancy rate as high as possible. Multiple parties, such as hotels, travel websites, and businesses, can use this data.

Q3. What is a hotel room price optimization model?

A hotel room price optimization model is an ML tool that predicts the room price based on total stay days, room type, any special request, etc. Hotels can use this tool to set competitive prices and maximize profit.

Q4. How accurately does the model predict the room price?

In hotels, the prediction of room prices relies on several factors, including data type and quality. If the model undergoes training with additional parameters, it improves its ability to predict prices more accurately.

Q5. How hotel room price estimation model is helpful to business?

This model can be used in hotels to establish competitive prices, attract more customers, and increase occupancy rates. Travelers can utilize it to secure the best deals at reasonable rates without hotels overcharging them. This also helps in travel budget planning.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Ashish Kumar 17 Nov, 2023

Data Science enthusiast...Data alchemist, turning numbers into insights. Passionate about using data to create a better future. I'm a data alchemist, turning numbers into insights and insights into solutions. I'm passionate about using data to solve real-world problems and make a positive impact on the world. I can conjure up data visualizations that illuminate patterns and trends that would otherwise be hidden. I'm also a storyteller, and I can use data to craft compelling narratives that inspire others to take action. I'm always looking for new challenges, and I'm excited to use my skills to make a difference in the world. Whether it's developing new machine learning algorithms to fight disease, or using data to optimize renewable energy production, I'm up for the task.

Advanced Datasets Guide Machine Learning Project-based Article