A Comprehensive Guide to Time Series Analysis
Synopsis of Time Series Analysis
 A TimeSeries represents a series of timebased orders. It would be Years, Months, Weeks, Days, Horus, Minutes, and Seconds
 A time series is an observation from the sequence of discretetime of successive intervals.
 A time series is a running chart.
 The time variable/feature is the independent variable and supports the target variable to predict the results.
 Time Series Analysis (TSA) is used in different fields for timebased predictions – like Weather Forecasting, Financial, Signal processing, Engineering domain – Control Systems, Communications Systems.
 Since TSA involves producing the set of information in a particular sequence, it makes a distinct from spatial and other analyses.
 Using AR, MA, ARMA, and ARIMA models, we could predict the future.
Introduction to Time Series Analysis
Time Series Analysis is the way of studying the characteristics of the response variable with respect to time, as the independent variable. To estimate the target variable in the name of predicting or forecasting, use the time variable as the point of reference. In this article we will discuss in detail TSA Objectives, Assumptions, Components (stationary, and Non stationary). Along with the TSA algorithm and specific use cases in Python.
Table of Contents

What is Time Series Analysis (TSA) and its assumption
 How to analyze Time Series (TSA)?
 Time Series Analysis Significance and its types.
 Components of Time Series
 What are the limitations of time series?
 Detailed Study of Time Series Data types.
 Discussion on stationary and Non stationary components
 Conversion of Non stationary into stationary
 Why is Time Series Analysis used in Data Science and Machine Learning?
 Time Series Analysis in Data Science and Machine Learning
 Implementation of AutoRegressive model
 Implementation of Moving Average (WEIGHTS – SIMPLE MOVING AVERAGE)
 Understanding ARMA and ARIMA
 Implementation steps for ARIMA
 Time Series Analysis – Process flow (Regap)
What is Time Series Analysis
A time series is nothing but a sequence of various data points that occurred in a successive order for a given period of time
 To understand how time series works, what factors are affecting a certain variable(s) at different points of time.
 Time series analysis will provide the consequences and insights of features of the given dataset that changes over time.
 Supporting to derive the predicting the future values of the time series variable.
 Assumptions: There is one and the only assumption that is “stationary”, which means that the origin of time, does not affect the properties of the process under the statistical factor.
How to analyze Time Series?
 Collecting the data and cleaning it
 Preparing Visualization with respect to time vs key feature
 Observing the stationarity of the series
 Developing charts to understand its nature.
 Model building – AR, MA, ARMA and ARIMA
 Extracting insights from prediction
Significance of Time Series and its types
 Analyzing the historical dataset and its patterns
 Understanding and matching the current situation with patterns derived from the previous stage.
 Understanding the factor or factors influencing certain variable(s) in different periods.
 Forecasting
 Segmentation
 Classification
 Descriptive analysis`
 Intervention analysis
Components of Time Series Analysis
 Trend
 Seasonality
 Cyclical
 Irregularity
 Trend: In which there is no fixed interval and any divergence within the given dataset is a continuous timeline. The trend would be Negative or Positive or Null Trend
 Seasonality: In which regular or fixed interval shifts within the dataset in a continuous timeline. Would be bell curve or saw tooth
 Cyclical: In which there is no fixed interval, uncertainty in movement and its pattern
 Irregularity: Unexpected situations/events/scenarios and spikes in a short time span.
What are the limitations of Time Series Analysis?
 Similar to other models, the missing values are not supported by TSA
 The data points must be linear in their relationship.
 Data transformations are mandatory, so a little expensive.
 Models mostly work on Univariate data.
Data Types of Time Series
 Stationary
 Non Stationary
 The MEAN value of them should be completely constant in the data during the analysis
 The VARIANCE should be constant with respect to the timeframe
 The COVARIANCE measures the relationship between two variables.
Methods to check Stationarity
 Augmented DickeyFuller (ADF) Test
 KwiatkowskiPhillipsSchmidtShin (KPSS) Test
 Null Hypothesis (H0): Series is nonstationary
 Alternate Hypothesis (HA): Series is stationary
 pvalue >0.05 Fail to reject (H0)
 pvalue <= 0.05 Accept (H1)
Converting Non stationary into stationary
 Detrending
 Differencing
 Transformation
8.1 Detrending: It involves removing the trend effects from the given dataset and showing only the differences in values from the trend. it always allows the cyclical patterns to be identified.
Yt= Yt – Yt1Yt=Value with time
Moving Average Methodology
 Simple Moving Average (SMA),
 Cumulative Moving Average (CMA)
 Exponential Moving Average (EMA)
To understand better, will use the AirTemperature.
import pandas as pd from matplotlib import pyplot as plt from statsmodels.graphics.tsaplots import plot_acf df_temperature = pd.read_csv('temperature_TSA.csv', encoding='utf8') df_temperature.head()
df_temperature.info()
# set index for year column df_temperature.set_index('Any', inplace=True) df_temperature.index.name = 'year' # Yearly average air temperature  calculation df_temperature['average_temperature'] = df_temperature.mean(axis=1) # drop unwanted columns and resetting the datafreame df_temperature = df_temperature[['average_temperature']] df_temperature.head()
# SMA over a period of 10 and 20 years df_temperature['SMA_10'] = df_temperature.average_temperature.rolling(10, min_periods=1).mean() df_temperature['SMA_20'] = df_temperature.average_temperature.rolling(20, min_periods=1).mean()
# Grean = Avg Air Temp, RED = 10 yrs, ORANG colors for the line plot colors = ['green', 'red', 'orange'] # Line plot df_temperature.plot(color=colors, linewidth=3, figsize=(12,6)) plt.xticks(fontsize=14) plt.yticks(fontsize=14) plt.legend(labels =['Average air temperature', '10years SMA', '20years SMA'], fontsize=14) plt.title('The yearly average air temperature in city', fontsize=20) plt.xlabel('Year', fontsize=16) plt.ylabel('Temperature [°C]', fontsize=16)
# CMA Air temperature df_temperature['CMA'] = df_temperature.average_temperature.expanding().mean()
# green Avg Air Temp and Orange CMA colors = ['green', 'orange'] # line plot df_temperature[['average_temperature', 'CMA']].plot(color=colors, linewidth=3, figsize=(12,6)) plt.xticks(fontsize=14) plt.yticks(fontsize=14) plt.legend(labels =['Average Air Temperature', 'CMA'], fontsize=14) plt.title('The yearly average air temperature in city', fontsize=20) plt.xlabel('Year', fontsize=16) plt.ylabel('Temperature [°C]', fontsize=16)
α –>Smoothing Factor.
 It has a value between 0,1.
 Represents the weighting applied to the very recent period.
# EMA Air Temperature # Let's smoothing factor  0.1 df_temperature['EMA_0.1'] = df_temperature.average_temperature.ewm(alpha=0.1, adjust=False).mean() # Let's smoothing factor  0.3 df_temperature['EMA_0.3'] = df_temperature.average_temperature.ewm(alpha=0.3, adjust=False).mean()
# green  Avg Air Temp, red smoothing factor  0.1, yellow  smoothing factor  0.3 colors = ['green', 'red', 'yellow'] df_temperature[['average_temperature', 'EMA_0.1', 'EMA_0.3']].plot(color=colors, linewidth=3, figsize=(12,6), alpha=0.8) plt.xticks(fontsize=14) plt.yticks(fontsize=14) plt.legend(labels=['Average air temperature', 'EMA  alpha=0.1', 'EMA  alpha=0.3'], fontsize=14) plt.title('The yearly average air temperature in city', fontsize=20) plt.xlabel('Year', fontsize=16) plt.ylabel('Temperature [°C]', fontsize=16)
Time Series Analysis in Data Science and Machine Learning
 P==> autoregressive lags
 q== moving average lags
 d==> difference in the order
Before we get to know about Arima, first you should understand the below terms better.
 AutoCorrelation Function (ACF)
 Partial AutoCorrelation Function (PACF)
Autocorrelation and Partial AutoCorrelation
plot_acf(df_temperature) plt.show()
plot_acf(df_temperature, lags=30) plt.show()
10.4 Interpret ACF and PACF plots
ACF 
PACF 
Perfect ML Model 
Plot declines gradually  Plot drops instantly  Auto Regressive model. 
Plot drops instantly  Plot declines gradually  Moving Average model 
Plot decline gradually  Plot Decline gradually  ARMA 
Plot drop instantly  Plot drop instantly  You wouldn’t perform any model 
Remember that both ACF and PACF require stationary time series for analysis.

Creating the model AutoReg()
 Call fit() to train it on our dataset.
 Returns an AutoRegResults object.
 Once fit, make a prediction by calling the predict () function
The equation for the AR model (Let’s compare Y=mX+c)
Y_{t }=C+b_{1} Y_{t1}+ b_{2} Y_{t2}+……+ b_{p} Y_{tp}+ Er_{t}
Key Parameters
 p=past values
 Y_{t}=Function of different past values
 Er_{t}=errors in time
 C=intercept
Lets’s check, given dataset or time series is random or not
from matplotlib import pyplot from pandas.plotting import lag_plot lag_plot(df_temperature) pyplot.show()
Observation: Yes, looks random and scattered.
Implementation of AutoRegressive model
#import libraries from matplotlib import pyplot from statsmodels.tsa.ar_model import AutoReg from sklearn.metrics import mean_squared_error from math import sqrt # load csv as dataset #series = read_csv('dailymintemperatures.csv', header=0, index_col=0, parse_dates=True, squeeze=True) # split dataset for test and training X = df_temperature.values train, test = X[1:len(X)7], X[len(X)7:] # train autoregression model = AutoReg(train, lags=20) model_fit = model.fit() print('Coefficients: %s' % model_fit.params) # Predictions predictions = model_fit.predict(start=len(train), end=len(train)+len(test)1, dynamic=False) for i in range(len(predictions)): print('predicted=%f, expected=%f' % (predictions[i], test[i])) rmse = sqrt(mean_squared_error(test, predictions)) print('Test RMSE: %.3f' % rmse) # plot results pyplot.plot(test) pyplot.plot(predictions, color='red') pyplot.show()
OUTPUT
predicted=15.893972, expected=16.275000 predicted=15.917959, expected=16.600000 predicted=15.812741, expected=16.475000 predicted=15.787555, expected=16.375000 predicted=16.023780, expected=16.283333 predicted=15.940271, expected=16.525000 predicted=15.831538, expected=16.758333 Test RMSE: 0.617
Observation: Expected (blue) Against Predicted (red). The forecast looks good on the 4th and the deviation on the 6th day.
Implementation of Moving Average (WEIGHTS – SIMPLE MOVING AVERAGE)
import numpy as np alpha= 0.3 n = 10 w_sma = np.repeat(1/n, n) colors = ['green', 'yellow'] # weights  exponential moving average alpha=0.3 adjust=False w_ema = [(1ALPHA)**i if i==N1 else alpha*(1alpha)**i for i in range(n)] pd.DataFrame({'w_sma': w_sma, 'w_ema': w_ema}).plot(color=colors, kind='bar', figsize=(8,5)) plt.xticks([]) plt.yticks(fontsize=10) plt.legend(labels=['Simple moving average', 'Exponential moving average (α=0.3)'], fontsize=10) # title and labels plt.title('Moving Average Weights', fontsize=10) plt.ylabel('Weights', fontsize=10)
Understanding ARMA and ARIMA
ARMA This is a combination of the AutoRegressive and Moving Average model for forecasting. This model provides a weakly stationary stochastic process in terms of two polynomials, one for the AutoRegressive and the second for the Moving Average.
ARMA is best for predicting stationary series. So ARIMA came in since it supports stationary as well as nonstationary.

AR ==> Uses the past values to predict the future
 MA ==> Uses the past error terms in the given series to predict the future
 I==> uses the differencing of observation and makes the stationary data
AR+I+MA= ARIMA
 p==> log order => No of lag observations.
 d==> degree of differencing => No of times that the raw observations are differenced.
 q==>order of moving average => the size of the moving average window
Implementation steps for ARIMA
Implementation of ARIMA
Already we have discussed steps 15, let’s focus on the rest here.
from statsmodels.tsa.arima_model import ARIMA model = ARIMA(df_temperature, order=(0, 1, 1)) results_ARIMA = model.fit()
results_ARIMA.summary()
results_ARIMA.forecast(3)[0]
Output
array([16.47648941, 16.48621826, 16.49594711])
results_ARIMA.plot_predict(start=200) plt.show()
Process flow (Regap)
In recent years, the use of Deep Learning for Time Series Analysis and Forecasting has been increased to resolve the problem statements, where we couldn’t be handled using Machine Learning techniques. Let’s discuss briefly.
Recurrent Neural Networks is the most traditional and accepted architecture, fitment for TimeSeries forecasting based problems.
RNN is organized into successive layers and divided into
 Input
 Hidden
 Output
Components of RNN
 Input: The function vector of x(t) is the input, at time step t.

Hidden:
 The function vector h(t) is the hiddenstate at time t,
 This is a kind of memory of the established network;
 This has been calculated based on the current input x(t) and the previoustime step’s hiddenstate h(t1):
 Output: The function vector y(t) is the output, at time step t.
 Weights : Weights: In the RNNs, the input vectorconnected to the hidden layer neurons at time t is by a weight matrix of U (Please refer to the above picture),
Internally weight matrix W is formed by the hidden layer neurons of time t1 and t+1. followed by this the hiddenlayer with to the output vector y(t) of time t by a V (weight matrix); all the weight matrices U, W, and V are constant for each time step.
Advantages  Disadvantages 
It has the special feature that it would remember every each information, so RNN is much useful for time series prediction  The big challenge is during the training period. 
Perfect for creating complex patterns from the input time series dataset.  Expensive computation cost 
Fast in prediction/forecasting  
Not affected by missing values, so the cleansing process can be limited 
I believe this guide would help you all, to understand the time series, flow, and how it works. Let’s connect soon with one more interesting topic, Until then, Bye! Cheers! Shanthababu.
The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.
One thought on "A Comprehensive Guide to Time Series Analysis"
Ved says: September 23, 2022 at 7:58 pm
The article is good, however you should provide the source dataset i.e. "temperature_TSA.csv" so that one can replicate the codes. without this its difficult to understand just by reading.