Statistical Tests to Check Stationarity in Time Series

Vijay Kumar G 21 Dec, 2023 • 11 min read

Introduction

In this article, I will be talking through the Augmented Dickey-Fuller test (ADF Test) and Kwiatkowski-Phillips-Schmidt-Shin test (KPSS test), which are the most common statistical tests used to test whether a given Time series is stationary or not. These 2 tests are the most commonly used statistical tests when it comes to analyzing the stationarity of a series. Stationarity is a very important factor in time series. In ARIMA time series forecasting, the first step is to determine the number of differences required to make the series stationary because a model cannot forecast on non-stationary time series data. let’s try to understand a little bit in-depth. To enhance your proficiency in Statistics and Exploratory Data Analysis (EDA), we suggest starting with a beginner-level YouTube video guide. This resource will provide you with valuable insights and knowledge. Access the video guide below to strengthen your skills in these areas.

Statistics & EDA for Machine Learning

Learning Objectives

  • In this article, we will be focusing on a part of the time series analysis.
  • We aim to understand what stationary series are.
  • We will be using statistical tests to determine whether a time series is stationary.

This article was published as a part of the Data Science Blogathon.

What Is a Stationary Time Series?

A Stationary series is one whose statistical properties such as mean, variance, covariance, and standard deviation do not vary with time, or these stats properties are not a function of time. In other words, stationarity in Time Series also means series without a Trend or Seasonal components.

Why Should Time Series Be Stationary?

Stationary series is easier for statistical models to predict effectively and precisely.

Why Are These Statistical Tests Important?

In data science, it is important to know about statistical tests, just as it is important to know about deep learning and machine learning algorithms. It helps us understand the data better and select forecasting models, like the ARIMA (Auto Regressive Integrated Moving Average) model or the SARIMA (Seasonal ARIMA) model.

Types of Stationary Series

  1. Strict Stationary – Satisfies the mathematical definition of a stationary process. Mean, variance & covariance are not a function of time.
  2. Seasonal Stationary – Series exhibiting seasonality.
  3. Trend Stationary – Series exhibiting trend.

Note: Once the seasonality and trend are removed, the series will be strictly stationary

How to Check Stationarity?

Visualizations

The most basic methods for stationarity detection rely on plotting the data and visually checking for trend and seasonal components. Trying to determine whether a stationary process generated a time series just by looking at its plot is a dubious task. However, there are some basic properties of non-stationary data that we can look for.

Let’s take an example of the following nice plots from [Hyndman & Athanasopoulos, 2018]:

stationarity in Time Series examples

Figure 1: Nine examples of time series data; (a) Google stock price for 200 consecutive days; (b) Daily change in the Google stock price for 200 consecutive days; (c) Annual number of strikes in the US; (d) Monthly sales of new one-family houses sold in the US; (e) Annual price of a dozen eggs in the US (constant dollars); (f) Monthly total of pigs slaughtered in Victoria, Australia; (g) Annual total of lynx trapped in the McKenzie River district of north-west Canada; (h) Monthly Australian beer production; (i) Monthly Australian electricity production. [Hyndman & Athanasopoulos, 2018]

  • Seasonality can be observed in series (d), (h), and (i)
  • The trend can be observed in series (a), (c), (e), (f), and (i)
  • Series (b) and (g) are stationary
  • Statistical Tests

Two statistical tests which we will be discussing are:

  1. Augmented Dickey-Fuller (ADF) Test
  2. Kwiatkowski-Phillips-Schmidt-Shin (KPSS) Test

Augmented Dickey-Fuller Test

Statistical tests make strong assumptions about your data. They can only be used to inform the degree to which a null hypothesis can be rejected or fail to be rejected. The result must be interpreted for a given problem to be meaningful.

However, they provide a quick check and confirmatory evidence that the time series is stationary or non-stationary.

The Augmented Dickey-Fuller test is a type of statistical test called a unit root test.

In probability theory and statistics, a unit root is a feature of some stochastic processes (such as random walks) that can cause problems in statistical inference involving time series models. In simple terms, the unit root is non-stationary but does not always have a trend component.

ADF test is conducted with the following assumptions:

  • Null Hypothesis (HO): Series is non-stationary, or series has a unit root.
  • Alternate Hypothesis(HA): Series is stationary, or series has no unit root.

If the null hypothesis is failed to be rejected, this test may provide evidence that the series is non-stationary.

Conditions to Reject Null Hypothesis(HO)

  • If Test statistic < Critical Value and p-value < 0.05 – Reject Null Hypothesis(HO), i.e., time series does not have a unit root, meaning it is stationary. It does not have a time-dependent structure.

Dickey-Fuller Test

Before going into the ADF test, let’s first understand what the Dickey-Fuller test is.

It uses an autoregressive model and optimizes an information criterion across multiple different lag values. A Dickey-Fuller test is a unit root test that tests the null hypothesis that α=1 in the following model equation. alpha is the coefficient of the first lag on Y.

Null Hypothesis (H0): alpha=1

where,

  • y(t-1) = lag 1 of time series
  • delta Y(t-1) = first difference of the series at the time (t-1)

Fundamentally, it has a similar null hypothesis as the unit root test. That is, the coefficient of Y(t-1) is 1, implying the presence of a unit root. If not rejected, the series is taken to be non-stationary.

The Augmented Dickey-Fuller test evolved based on the above equation and is one of the most common forms of the Unit Root Test.

How Does the Augmented Dickey-Fuller (ADF) Test Work?

As the name suggests, the ADF test is an ‘augmented’ version of the Dickey-Fuller test. The ADF test expands the Dickey-Fuller test equation to include a high-order regressive process in the model.

ADF test stationarity in Time Series

If you notice, we have only added more differencing terms while the rest of the equation remains the same. This adds more thoroughness to the test. The null hypothesis, however, is still the same as the Dickey-Fuller test.

A key point to remember here is: Since the null hypothesis assumes the presence of unit root, that is α=1, the p-value obtained should be less than the significance level (say 0.05) in order to reject the null hypothesis. Thereby inferring that the series is stationary.

However, this is a very common mistake analysts commit with this test. That is, if the p-value is less than the significance level, people mistakenly take the series to be non-stationary.

ADF Test in Python

So, how to perform an Augmented Dickey-Fuller test in Python?

We will now go through a tutorial for beginners to understand how we can do the ADF test using python code.

The statsmodels package provides a reliable implementation of the ADF test via the adfuller() function in statsmodels.tsa.stattools. It returns the following outputs:

  1. The p-value
  2. The value of the test statistic
  3. Number of lags considered for the test
  4. The critical value cut-offs.

When the test statistic is lower than the critical value shown, you reject the null hypothesis and infer that the time series is stationary.

Alright, let’s run the ADF test on the sunspots dataset from the statsmodels library of python. As seen earlier, the null hypothesis of the test is the presence of a unit root; that is, the series is non-stationary.

Let’s run the ADF test on Time series data and analyze the result. We will first import the required libraries, and then we will load the dataset to a dataframe using the pd.read_csv function from pandas.

# Load the libraries
import numpy as np
import pandas as pd
# Load Statsmodels 
import statsmodels.api as sm
# Load Matplotlib for visualization
import matplotlib.pyplot as plt
%matplotlib inline
# Load the dataset
df = sm.datasets.sunspots.load_pandas().data
# Check the dimensionality of the dataset
df.shape
print("Dataset has {} records and {} columns".format(df.shape[0], df.shape[1]))
# Changing the YEAR data type and setting it as index
df['YEAR'] = pd.Index(sm.tsa.datetools.dates_from_range('1700', '2008'))
df.index = df['YEAR']
# Check the data type
del df['YEAR']
# View the dataset
df.head()
# Plotting the Data
# Define the plot size
plt.figure(figsize=(16,5))
# Plot the data
plt.plot(df.index, df['SUNACTIVITY'], label = "SUNACTIVITY")
plt.legend(loc='best')
plt.title("Sunspot Data from year 1700 to 2008")
plt.show()
# ADF Test
# Function to print out results in customised manner
from statsmodels.tsa.stattools import adfuller
def adf_test(timeseries):
    print ('Results of Dickey-Fuller Test:')
    dftest = adfuller(timeseries, autolag='AIC')
    dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
    for key,value in dftest[4].items():
        dfoutput['Critical Value (%s)'%key] = value
    print (dfoutput)
# Call the function and run the test

adf_test(df['SUNACTIVITY'])

The ADF tests give the following results – test statistic, p-value, and critical value at 1%, 5%, and 10% confidence intervals.

Results of Dickey-Fuller Test:
Test Statistic                   2.837781
p-value                          0.053076
#Lags Used                       8.000000
Number of Observations Used    300.000000
Critical Value (1%)             -3.452337
Critical Value (5%)             -2.871223
Critical Value (10%)            -2.571929
dtype: float64

The Test Statistic is 2.837781, which is greater than any of the critical values.

p-value is 0.053076

The p-value obtained is greater than the significance level of 0.05, and the ADF statistic is higher than any of the critical values. Clearly, there is no reason to reject the null hypothesis. So, the time series is, in fact, non-stationary.

How Does the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) Test Work?

The KPSS test, short for, Kwiatkowski-Phillips-Schmidt-Shin (KPSS), is a type of Unit root test that tests for the stationarity of a given series around a deterministic trend. In other words, the test is somewhat similar in spirit to the ADF test. A common misconception, however, is that it can be used interchangeably with the ADF test. This can lead to misinterpretations about stationarity, which can easily go undetected, causing more problems down the line.

Further in this article, you will see how to implement the KPSS test in python, how it is different from the ADF test, and when and what all things you need to take care of when implementing a KPSS test.

How to Implement the KPSS Test?

In python, the statsmodel package provides a convenient implementation of the KPSS test.

A key difference from the ADF test is the null hypothesis of the KPSS test is that the series is stationary. So practically, the interpretation of p-value is just the opposite of each other. That is, if the p-value is < significance level (say 0.05), then the series is non-stationary. Whereas in the ADF test, it would mean the tested series is stationary.

Alright, let’s implement the test on the ‘sunspots’ dataset.

The KPSS test is conducted with the following assumptions.

  • Null Hypothesis (HO): Series is trend stationary or series has no unit root.
  • Alternate Hypothesis(HA): Series is non-stationary, or series has a unit root.

Note: The hypothesis is reversed in the KPSS test compared to ADF Test.

If the null hypothesis is failed to be rejected, this test may provide evidence that the series is trend stationary.

Conditions to Fail to Reject Null Hypothesis(HO)

  • If the Test Statistic < Critical Value and p-value < 0.05 – Fail to Reject Null Hypothesis(HO), i.e., time series does not have a unit root, meaning it is trend stationary.

To implement the KPSS test, we’ll use the kpss function from the statsmodel. The code below implements the test and prints out the returned outputs and interpretation from the result.

Let’s run the KPSS test on Time series data and analyze the result.

# Function to print out results in customised manner
from statsmodels.tsa.stattools import kpss
def kpss_test(timeseries):
    print ('Results of KPSS Test:')
    kpsstest = kpss(timeseries, regression='c', nlags="auto")
    kpss_output = pd.Series(kpsstest[0:3], index=['Test Statistic','p-value','#Lags Used'])
    for key,value in kpsstest[3].items():
        kpss_output['Critical Value (%s)'%key] = value
    print (kpss_output)

# Call the function and run the test

kpss_test(df[‘SUNACTIVITY’])

Results of KPSS Test:
Test Statistic           0.669866
p-value                  0.016285
Lags Used                7.000000
Critical Value (10%)     0.347000
Critical Value (5%)      0.463000
Critical Value (2.5%)    0.574000
Critical Value (1%)      0.739000

How to Interpret KPSS Test Results?

The output of the KPSS test contains 4 things:

  1. The KPSS statistic
  2. p-value
  3. Number of lags used by the test
  4. Critical values

The p-value reported by the test is the probability score based on which you can decide whether to reject the null hypothesis or not. If the p-value is less than a predefined alpha level (typically 0.05), we reject the null hypothesis.

The KPSS statistic is the actual test statistic that is computed while performing the test. For more information, no the formula, the references mentioned at the end should help.

In order to reject the null hypothesis, the test statistic should be greater than the provided critical values. If it is, in fact, higher than the target critical value, then that should automatically reflect in a low p-value. That is, if the p-value is less than 0.05, the kpss statistic will be greater than the 5% critical value.

Finally, the number of lags reported is the number of lags of the series that was actually used by the model equation of the kpss test. By default, the statsmodels kpss() uses the ‘legacy’ method. In legacy method, int(12 * (n / 100)**(1 / 4)) a number of lags are included, where n is the length of the series.

Test Statistic is 0.669866

Critical Value (5%) is 0.463000

p-value is 0.016285

Test Statistic > Critical Value and p-value < 0.05. As a result, we reject the Null hypothesis in favor of an Alternative.
Hence we conclude series is non-stationary

When to Choose ADF or KPSS Test?

There could be a lot of confusion on when one should use the ADF test or KPSS test and which test would give a correct result. A better solution is to apply/run both tests and makes sure that the series is truly stationary.

The following are the possible outcomes of applying both tests.

  • Case 1: Both tests conclude that the given series is stationaryThe series is stationary
  • Case 2: Both tests conclude that the given series is non-stationary The series is non-stationary
  • Case 3: ADF concludes non-stationary, and KPSS concludes stationaryThe series is trend stationary. To make the series strictly stationary, the trend needs to be removed in this case. Then the detrended series is checked for stationarity.
  • Case 4: ADF concludes stationary, and KPSS concludes non-stationaryThe series is difference stationary. Differencing is to be used to make series stationary. Then the differenced series is checked for stationarity.

Conclusion

Stationarity is an important property of time series data that indicates that the statistical properties of the data do not change over time. It is essential for various time series analysis techniques, including forecasting and modeling. Two tests for checking the stationarity of a time series are used, namely the ADF test and the KPSS test. The article provides step-by-step instructions on how to perform each of these tests in Python. The article also emphasized the importance of choosing the right statistical test for the specific time series data and highlighted some common mistakes to avoid when testing for stationarity.

Detrending is carried out by using differencing techniques, which will be covered in future articles on Statistical tests to check stationarity in Time Series.

For further reading and to know more about ACF, PACF, ARMA, ARIMA model implementation with python, and time series analysis, have a look at this article https://www.analyticsvidhya.com/blog/2021/10/a-comprehensive-guide-to-time-series-analysis/
Key Takeaways

  • There are various statistical tests to check stationarity, including the Augmented Dickey-Fuller (ADF) test and the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test.
  • The ADF test is a widely used test for checking the stationarity of a time series, and it checks for the presence of a unit root in the data.
  • The KPSS test is another popular test that checks for the trend stationarity of the data, and it is often used in conjunction with the ADF test.

Frequently Asked Questions

Q1. What is the difference between ADF and KPSS tests?

A. ADF test checks for the presence of a unit root, which implies non-stationarity due to a long-run trend, while the KPSS test checks for the presence of a trend in the time series. Both tests are useful in determining the stationarity of a time series, and it’s a good idea to use them together to get a more complete picture of the properties of the time series.
The residuals in the ADF and KPSS tests represent the differences between the observed and predicted values of the model used in the test. In the case of the ADF test, the residuals are stationary if the time series is stationary, while in the case of the KPSS test, the residuals are non-stationary if the time series is non-stationary.

Q2. How are the ACF, PACF, ADF, and KPSS tests related, and how can the ACF and PACF be used to determine the stationarity of a time series?

A. The ACF(autocorrelation function) and PACF(partial autocorrelation function) can be used to visualize the correlation structure of a time series, and rapid decay to zero in both functions is an indication of stationarity. The ADF and KPSS tests can be used to statistically determine stationarity, with the ADF testing for the presence of a unit root and the KPSS testing for the absence of a unit root. To ensure accurate results, it is recommended to apply both tests, along with the ACF and PACF, to determine the stationarity of a time series.

Q3. What is the null hypothesis of the ADF and KPSS tests?

A. The null hypothesis of the ADF test is that the time series contains a unit root and is non-stationary, while the alternative hypothesis is that the time series is stationary. The ADF test is used to determine the presence of a unit root in the time series. If the null hypothesis is rejected, it indicates that the time series is stationary.
The null hypothesis of the KPSS test is that the time series is stationary, while the alternative hypothesis is that the time series is non-stationary. The KPSS test is used to determine the absence of a unit root in the time series. If the null hypothesis is not rejected, it indicates that the time series is stationary.

Vijay Kumar G 21 Dec 2023

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear

Ranil Rangana
Ranil Rangana 15 Aug, 2021

This is a very useful emphasis. Thank you

Tooba
Tooba 02 Oct, 2022

If Test statistic < Critical Value and p-value Test statistic < Critical Value - Reject Null Hypthesis - Data is stationary

Orlando
Orlando 09 Feb, 2023

What if both tests tell me my data is stationary but when I look at the graph it is obviosly non-stationary?

Fitsum Bekele
Fitsum Bekele 25 Sep, 2023

kindly send me useful links

Related Courses

Time Series
Become a full stack data scientist