Illiyas Sha — June 29, 2021
Beginner Data Cleaning Pandas Python Time Series

This article was published as a part of the Data Science Blogathon

Table of Content

Let us have a quick overview of this blog.

→What is time series?

→Real-life scenarios of time series

→Time series analysis

→Forecasting

→Types of forecasting

1) Quantitative forecasting

2) Qualitative forecasting

→Regression vs Time series

→Time Series components

→Analyzing kaggle time-series data

→Plotting the time-series graph

What is time series?

Time series is a sequence or series of data points in which the time component is involved throughout the occurrence.

Example of time series data

Healthcare industry – Blood pressure monitoring, Heart rate monitoring.

Environment – Global temperature and air pollution levels.

Society – Birth rates over a period of time, Population, etc

example of time series data

https://www.google.com/url?sa=i&url=https%3A%2F%2Fwww.statisticshowto.com%2Ftimeplot%2F&psig=AOvVaw3ITtX1BRvscqV6ZaErhFeK&ust=1624890003979000&source=images&cd=vfe&ved=0CAoQjRxqFwoTCKDAlpOBuPECFQAAAAAdAAAAABAU

 

What is Time series analysis?

Analyzing this time series data with certain tools and techniques is called time series analysis.

The restaurant’s daily visitors are predicted by this time series data. So that the restaurant management can appoint and accommodate staff according to the number of visitors.

Forecasting

Forecasting is the process of making predictions from the historical data so that they can predict the future from the past and present data.

Types of forecasting:

1) Quantitative forecasting

2) Qualitative forecasting

Let us see what it is,

1) Quantitative forecasting

Quantitative forecasting is done based on the historical data (i,e) Past and present data mostly numerical data. Through this historical data, we use statistical methods and so we can predict with lesser bias.

2) Qualitative forecasting

Qualitative forecasting is done based on the opinion and judgment of the subject matter experts and the customers. Why we rely upon judgment instead of data? Because in some cases, the past data are not available or unclear. so here we are depend on judgment and opinions.

You may have some doubts about regression and time series. Both have some similarities and differences.

Regression vs Time Series

The Regression analysis and Time series analysis are done on continuous variables.

Regression

→It is the relationship between dependent and independent variables.

→The target variable is continuous.

→This involves finding patterns in the data and predict the target with this pattern.

regression vs Time series Analysis
Regression

Time Series 

→It is the series of data points associated with time.

→The target variable is continuous.

→This involves finding trends in the data and forecast the future with this trend.

Time series plot of trade

Time series – https://i1.wp.com/statisticsbyjim.com/wp-content/uploads/2020/07/TimeSeriesTrade.png?fit=576%2C384&ssl=1

Time series Components

The time-series graph helps to highlight the trend and behavior of the data over time for building a more reliable model. To understand these patterns, we should structure this data and breakdown into several factors. We use various components to break down this data. They are,

Structural breaks

Trend

Seasonality

Cyclicity

Noise

Level

1)Structural breaks

It is a component that shows some sudden change in the time series data. This structural break affects the reliability of the results. Statistical methods should be used to identify the structural breaks.

structural breaks | Time series Analysis
STRUCTURAL BREAKS 

2) Trend

Time series data may have a thing that is proportionate to the time period. There occurs the trend. In short “Trend” is the demonstration of whether the time series has moved higher or lower over a time period. The reliability of the results of time series relies upon the correct identification of time trends.

Here is an example, the Monthly revenue of a company. This shows an increasing trend

Trend | Time series Analysis

3) Seasonality

Seasonality is also a component where the time series data shows a regular pattern over an interval of time. It repeats after the fixed interval of time.

(An example of a time series with seasonality is sales, which often increases for every 20 days)

seasonality | Time series Analysis
https://www.vosesoftware.com/riskwiki/images/image1858.gif

4) Cyclicity

Cyclicity is the component in which the time series data repeats after some interval of time. The interval is not fixed here.

Example:

 Electricity demand per week is plotted in a time-series graph. The demand per 2 weeks repeats cyclically. This represents cyclicity.

cyclicity | Time series Analysis

https://robjhyndman.com/hyndsight/2011-12-14-cyclicts_files/figure-html/unnamed-chunk-3-1.png

5) Noise

Noise is the random fluctuation in the time series data. We can’t consider them for predicting the future.

6) Level

The average time series is called level.

Analyzing kaggle time series data:

In this analysis, I have used Kaggle‘s dataset. Kaggle is a platform where we can find datasets, notebooks, and other kinds of stuff related to data science. Competitions are also hosted for practice.

Dataset used in this analysis: Time series starter dataset

Reading the dataset

import pandas as pd
data = pd.read_csv('/content/sample_data/Month_Value_1.csv')
data.head()

 

Time series dataset

Cleaning the dataset:

shape of data

This dataset contains 5 columns and 96 rows.

The columns are

[0] – Period

[1] – Revenue

[2] – Sales_quantity

[3] – Average_cost

[4] – The_average_annual_payroll_of_the_region

Description of each column to decide which is important

Period – It contains the Period for the model. The monthly wise date from 2015 to 2020 is specified here.

Revenue – Company’s revenue for each month from 2015 to 2020.

Sales_quantity – Company’s sales quantity

Average_cost –  Average cost of production

The_average_annual_payroll_of_the_region – The average number of employees in the region per year.

Plotting the line chart for 5 columns

data.plot.line(x=none,y=none)

 

Line chart | Time series Analysis

This contains all the data from 5 columns. So it doesn’t give an exact view. So

Let us clean the dataset. 

We can analyze the time series of revenue from 2015 to 2020 and drop all other columns now.

data cleaning
data = data.drop('Sales_quantity', 1)
data = data.drop('Average_cost', 1)
data = data.drop('The_average_annual_payroll_of_the_region', 1)

The syntax for dropping the column is

dataframe.drop('Column_name',1)

where 1 is the axis number (0 for rows and 1 for columns)

Now we have only period and revenue columns for analysis.

Let us plot the graph

data.plot.line(x=None,y=None)
period vs revenue

This time-series graph shows the increasing trend. So the revenue of the company increases from 2015 to 2020.

You can take a look into this Time series notebook for code :

Time series starter dataset notebook

Endnotes

We have seen some concepts of time series analysis and analyzed Kaggle’s starter dataset for time series.

Thanks for reading!

I hope you enjoyed the article and increased your knowledge about time series analysis. Please feel free to contact me at [email protected] Linkedin

Want to share your thoughts? Feel free to comment below

About the author

Mohamed Illiyas

Currently, I am pursuing my Bachelor of Engineering (B.E) in Computer Science from the Government College of Engineering, Srirangam, Tamil Nadu. I am very enthusiastic about Statistics, Machine Learning, and Data Science.

Connect with me on Linkedin Mohamed Illiyas

The media shown in this article explaining how to Deploy Streamlit Application on Heroku are not owned by Analytics Vidhya and are used at the Author’s discretion.

About the Author

Our Top Authors

  • Analytics Vidhya
  • Guest Blog
  • Tavish Srivastava
  • Aishwarya Singh
  • Aniruddha Bhandari
  • Abhishek Sharma
  • Aarshay Jain

Download Analytics Vidhya App for the Latest blog/Article

Leave a Reply Your email address will not be published. Required fields are marked *