Learn everything about Analytics

Home » Using Hurst Exponent to analyse the Stock and Crypto market with Python

Using Hurst Exponent to analyse the Stock and Crypto market with Python

This article was published as a part of the Data Science Blogathon

Introduction

Cutting straight right to the chase, Hurst exponent is a quick way to investigate if the time series in question is random walking, mean-reverting, or trending. In the world of finance, many traders strategize based on momentum and mean-reversion strategies. I will provide a brief about them below :

Hurst Exponent 1

 

Existing Strategies

  • Momentum-based strategies are ones where the investors try to make use of the continuous trends of the market. For example in the last quarter, a specific company or cryptocurrency was doing great and the price went up. In that situation, an investor or any person, in general, can either predict that the price will continue to go up and hence enter a long position or they can predict that the price will decrease and hence they will enter a short position. Well, the decisions are not random and they can carefully make after basing them over on a series of indicators.
  • Mean-Reversion is a strategy where one assumes that the various properties like the stock returns and the volatility will add to their long-term gains over time. Such a time series is also known as an Ornstein-Uhlenbeck process. In such cases, you can make money based on an assumption that after some extreme event which might be either positive or negative that the stock price will follow the long-term pattern.

We can very easily identify those two types of patterns on a specific plot that depicts the price action over time. So if someone were to build a plot for all the stocks or cryptocurrency out there and then go over it manually and decide which plot is looking good, is not a feasible solution. That is why we will be using a parameter to determine if we can find a pattern in the data and then give the output in the form of a value which is much easier to analyze than a plot by a code.

The solution provided in this article is to use the Hurst exponent if a given time series which can be of a stock, cryptocurrency, or any time series in general, is mean-reverting, trending, or simply doing a random walk over time.

The Hurst Exponent

The Hurst exponent or also denoted by the letter ‘H’ is a measure of long-term memory of a time series where we measure the amount by which a given time series deviates from a random walk. The scalar value shows the relative tendency of a time series to cluster in a particular direction (trending pattern in both positive and negative manner) or regressing strongly to the mean (mean-reverting pattern).

The various values of the Hurst exponent always range between 0 and 1 and based on the value of H we can classify the given time series as follows:

  • H < 0.5 – Mean-Reverting ( anti-persistent ) series.A closer value to 0 means that there is a strong pattern of the mean-reversion process. In practical situations, it means that a high value will be followed by a low value next and vice versa.
  • H = 0.5 – Geometric Random Walk . This means that it can go either way and there is no clear deduction possible from the given parameters
  • H > 0.5 – Trending (Persistent) series. A value that is more close to 1 means that the trending pattern is strong and the trend will continue. Generally, it means that a high value will be followed by a higher value.

In this article, we will show one of the implementations which are based on the estimation of the diffusive behavior based on the variance of the log prices.

To begin, we will define x as a logarithm of the stock prices ‘S’

Hurst Exponent log of price

The variance of the arbitrary lag can be denoted by ‘τ’ can be expressed as :

Hurst Exponent var

If the given time series is following a Geometric Brownian Motion (GBM, Random walk), the Variance will then linearly follow with the Lag ‘τ’.

VAR t

This might not be the case all the time and they might not follow GBM always. When a form of autocorrelation exists, there are deviations from a random walk and the variance lag is not proportional to the lag itself but an anomalous exponent comes into play here. The new formula looks like this :

Hurst Exponent var~t

Where the ‘H’ stands for the Hurst Exponent.

Note: Another method for finding the Hurst Exponent uses rescaled range analysis (R/S). It was proven that using this method can sometimes lead to better results but one shortcoming of that method is that it is very sensitive to the various short-range dependence.

Coding in Python

To begin with, we will first import some libraries in python, and instead of fully relying on a predefined library, we will create a function that calculates the Hurst exponent.

You would first need to download the necessary libraries which we are going to use if you do not have them already in your environment and we will use real-life stock data of ‘Google’ over the past 10 years and we will download it from yfinance.

import yfinance as yf
import numpy as np
import pandas as pd
spy_df = yf.download("GOOGL", 

                     start="2010-01-01", 

                     end="2020-12-31", 

                     progress=False)

spy_df["Adj Close"].plot(title="Google");

Output :

 

google

We will also generate some artificial data to show you examples of all the 3 cases and how accurately we are able to identify the Hurst Exponent. The following code snippet will show it.

sample_size = 1000
scalar = 100
np.random.seed(123)
df = pd.DataFrame(data={"mean_rev": np.log(np.random.randn(sample_size) + scalar),
                        "gbm": np.log(np.cumsum(np.random.randn(sample_size)) + scalar),
                        "trending": np.log(np.cumsum(np.random.randn(sample_size) + 1) + scalar)})
df.plot(title="Generated time series")

Output : 

 

output Hurst's Exponent

Here you can clearly see the patterns we were expecting to see and then we will move on to coding a function that will calculate the Hurst exponent in our code. To calculate the Hurst exponent, we first need to calculate the standard deviation of the differences between the series and its lagged versions. For a possible range of lag, we will estimate the Hurst exponent as a slope of the log-log plot of the number of lags versus the mentioned standard deviations

def get_hurst_exponent(time_series, max_lag=20):
    """Returns the Hurst Exponent of the time series"""
    lags = range(2, max_lag)
    # variances of the lagged differences
    tau = [np.std(np.subtract(time_series[lag:], time_series[:-lag])) for lag in lags]
    # calculate the slope of the log plot -> the Hurst Exponent
    reg = np.polyfit(np.log(lags), np.log(tau), 1)
    return reg[0]

The total number of Lag needed is quite arbitrary but the default value of 20 is a good beginning place. It has been found out that for a variation in the value of lag, the result might be more or less accurate. We will begin by analyzing the data of Google and find that the plot is trending with some exceptions. The overall trend is positive. We will calculate the Hurst value with different Lag values and see how they all turn out to be.

for lag in [5, 10, 20, 100, 300, 500, 1000, 2000, 2700]:
    hurst_exp = get_hurst_exponent(spy_df["Adj Close"].values, lag)
    print(f"Hurst exponent with {lag} lags: {hurst_exp:.4f}")

Output :

Hurst exponent with 5 lags: 0.4790
Hurst exponent with 10 lags: 0.4723
Hurst exponent with 20 lags: 0.4767
Hurst exponent with 100 lags: 0.3821
Hurst exponent with 300 lags: 0.2861
Hurst exponent with 500 lags: 0.2642
Hurst exponent with 1000 lags: 0.2399
Hurst exponent with 2000 lags: 0.2402
Hurst exponent with 2700 lags: 0.2211

As we can see that with the increase of the lag value, the mean-reversion is getting stronger as the value is decreasing towards 0.

One needs to remember that these values are calculated over a 10 year long time period and narrowing it down to a certain period can give different results depending on how short the time frame is. One needs to remember this while using the Hurst exponent. Now when we repeat it for the artificially generated series, we will see how the functions will perform.

for lag in [5, 10, 20, 100, 300, 500]:
    print(f"Hurst exponents with {lag} lags ----")
    for column in df.columns:
        print(f"{column}: {get_hurst_exponent(df
.values, lag):.4f}")

Output :

Hurst exponents with 5 lags ----
mean_rev: -0.0189
gbm: 0.4450
trending: 0.7725
Hurst exponents with 10 lags ----
mean_rev: 0.0085
gbm: 0.4333
trending: 0.8368
Hurst exponents with 20 lags ----
mean_rev: 0.0064
gbm: 0.4539
trending: 0.8715
Hurst exponents with 100 lags ----
mean_rev: -0.0021
gbm: 0.5401
trending: 0.8442
Hurst exponents with 300 lags ----
mean_rev: 0.0002
gbm: 0.5691
trending: 0.7463
Hurst exponents with 500 lags ----
mean_rev: 0.0015
gbm: 0.4854
trending: 0.6662

The Conclusion that we can find for the mean-reverting series is that it is the most consistent over all the values of the lag. The negative values might be due to some approximations but ideally, it should not cross much below 0. For the random walk one, we can clearly see that the value oscillates between the desired 0.5 value. For the last but not least one, we can see that the trending series can be correctly identified by the Hurst exponent for most of the selected lags but as we increase it, the value of exponent decreases to 0.5 and that would indicate a random walk.

Hurst's Exponent mean reversion

Our results showed Google is following this trend which can be seen from the chart too that the price is oscillating around its mean price and that is the trend it is following and hence our final deduction from this can be that the results are very close enough to our expectations but the maximum number of lags that we consider can change our conclusion. The given analysis shows that a series cannot be purely trending or mean-reverting alone. Another factor here is whether we are looking at it in a long time frame or short (We do that by looking at the different range of lags). For larger time frames we get a more accurate result which is 'Mean-reversion'

Key takeaways from the results :

We can deduce the following statements from the above results:

  • Hurst exponent is a measure of the memory in any time series which is used to classify the time series as trending, mean-reverting, or a random walk.
  • By changing the lag parameter we can receive different results on whether we want to look at the time series on a short-term or long-term basis. The results will differ accordingly
  • The graph is always trying to maintain the method of Mea reversion which is often followed by most of the stocks and cryptocurrency but at shorter time intervals it might become trending also.

Endnotes

Thank you for reading till the end. Hope you are doing well and stay safe and are getting vaccinated soon or already are.

Arnab Mondal

Data Engineer & Python Developer | Freelance Tech Writer

Follow me on LinkedIn and feel free to ask me about any doubts you have

Link to my other articles

You can also read this article on our Mobile APP Get it on Google Play