# Using Hurst Exponent to analyse the Stock and Crypto market with Python

This article was published as a part of the Data Science Blogathon

**Introduction**

Cutting straight right to the chase, **Hurst exponent** is a quick way to investigate if the time series in question is **random walking, mean-reverting, or trending**. In the world of finance, many traders strategize based on momentum and mean-reversion strategies. I will provide a brief about them below :

**Existing Strategies**

**Momentum-based strategies**are ones where the investors try to make use of the continuous trends of the market. For example in the last quarter, a specific company or cryptocurrency was doing great and the price went up. In that situation, an investor or any person, in general, can either predict that the price will continue to go up and hence enter a long position or they can predict that the price will decrease and hence they will enter a short position. Well, the decisions are not random and they can carefully make after basing them over on a series of indicators.**Mean-Reversion**is a strategy where one assumes that the various properties like the stock returns and the volatility will add to their long-term gains over time. Such a time series is also known as an Ornstein-Uhlenbeck process. In such cases, you can make money based on an assumption that after some extreme event which might be either positive or negative that the stock price will follow the long-term pattern.

We can very easily identify those two types of patterns on a specific plot that depicts the price action over time. So if someone were to build a plot for all the stocks or cryptocurrency out there and then go over it manually and decide which plot is looking good, is not a feasible solution. That is why we will be using a parameter to determine if we can find a pattern in the data and then give the output in the form of a value which is much easier to analyze than a plot by a code.

The solution provided in this article is to use the Hurst exponent if a given time series which can be of a stock, cryptocurrency, or any time series in general, is mean-reverting, trending, or simply doing a random walk over time.

**The Hurst Exponent**

The Hurst exponent or also denoted by the letter ‘H’ is a measure of long-term memory of a time series where we measure the amount by which a given time series deviates from a random walk. The scalar value shows the relative tendency of a time series to cluster in a particular direction (trending pattern in both positive and negative manner) or regressing strongly to the mean (mean-reverting pattern).

The various values of the Hurst exponent always range between 0 and 1 and based on the value of **H **we can classify the given time series as follows:

**H < 0.5 –**Mean-Reverting ( anti-persistent ) series.A closer value to 0 means that there is a strong pattern of the mean-reversion process. In practical situations, it means that a high value will be followed by a low value next and vice versa.**H = 0.5 –**Geometric Random Walk . This means that it can go either way and there is no clear deduction possible from the given parameters**H > 0.5 –**Trending (Persistent) series. A value that is more close to 1 means that the trending pattern is strong and the trend will continue. Generally, it means that a high value will be followed by a higher value.

In this article, we will show one of the implementations which are based on the estimation of the diffusive behavior based on the variance of the log prices.

To begin, we will define x as a logarithm of the stock prices ‘S’

The variance of the arbitrary lag can be denoted by ‘*τ*’ can be expressed as :

If the given time series is following a Geometric Brownian Motion (GBM, Random walk), the Variance will then linearly follow with the Lag ‘*τ*’.

This might not be the case all the time and they might not follow GBM always. When a form of autocorrelation exists, there are deviations from a random walk and the variance lag is not proportional to the lag itself but an anomalous exponent comes into play here. The new formula looks like this :

Where the ‘H’ stands for the Hurst Exponent.

**Note:** Another method for finding the Hurst Exponent uses rescaled range analysis (R/S). It was proven that using this method can sometimes lead to better results but one shortcoming of that method is that it is very sensitive to the various short-range dependence.

**Coding in Python**

To begin with, we will first import some libraries in python, and instead of fully relying on a predefined library, we will create a function that calculates the Hurst exponent.

You would first need to download the necessary libraries which we are going to use if you do not have them already in your environment and we will use real-life stock data of ‘Google’ over the past 10 years and we will download it from yfinance.

import yfinance as yf import numpy as np import pandas as pd

spy_df = yf.download("GOOGL", start="2010-01-01", end="2020-12-31", progress=False) spy_df["Adj Close"].plot(title="Google");

**Output :**

We will also generate some artificial data to show you examples of all the 3 cases and how accurately we are able to identify the Hurst Exponent. The following code snippet will show it.

sample_size = 1000 scalar = 100 np.random.seed(123) df = pd.DataFrame(data={"mean_rev": np.log(np.random.randn(sample_size) + scalar), "gbm": np.log(np.cumsum(np.random.randn(sample_size)) + scalar), "trending": np.log(np.cumsum(np.random.randn(sample_size) + 1) + scalar)}) df.plot(title="Generated time series")

**Output : **

Here you can clearly see the patterns we were expecting to see and then we will move on to coding a function that will calculate the Hurst exponent in our code. To calculate the Hurst exponent, we first need to calculate the standard deviation of the differences between the series and its lagged versions. For a possible range of lag, we will estimate the Hurst exponent as a slope of the log-log plot of the number of lags versus the mentioned standard deviations

def get_hurst_exponent(time_series, max_lag=20): """Returns the Hurst Exponent of the time series""" lags = range(2, max_lag) # variances of the lagged differences tau = [np.std(np.subtract(time_series[lag:], time_series[:-lag])) for lag in lags] # calculate the slope of the log plot -> the Hurst Exponent reg = np.polyfit(np.log(lags), np.log(tau), 1) return reg[0]

The total number of Lag needed is quite arbitrary but the default value of 20 is a good beginning place. It has been found out that for a variation in the value of lag, the result might be more or less accurate. We will begin by analyzing the data of Google and find that the plot is trending with some exceptions. The overall trend is positive. We will calculate the Hurst value with different Lag values and see how they all turn out to be.

for lag in [5, 10, 20, 100, 300, 500, 1000, 2000, 2700]: hurst_exp = get_hurst_exponent(spy_df["Adj Close"].values, lag) print(f"Hurst exponent with {lag} lags: {hurst_exp:.4f}")

**Output :**

Hurst exponent with 5 lags: 0.4790 Hurst exponent with 10 lags: 0.4723 Hurst exponent with 20 lags: 0.4767 Hurst exponent with 100 lags: 0.3821 Hurst exponent with 300 lags: 0.2861 Hurst exponent with 500 lags: 0.2642 Hurst exponent with 1000 lags: 0.2399 Hurst exponent with 2000 lags: 0.2402 Hurst exponent with 2700 lags: 0.2211

As we can see that with the increase of the lag value, the mean-reversion is getting stronger as the value is decreasing towards 0.

One needs to remember that these values are calculated over a 10 year long time period and narrowing it down to a certain period can give different results depending on how short the time frame is. One needs to remember this while using the Hurst exponent. Now when we repeat it for the artificially generated series, we will see how the functions will perform.

for lag in [5, 10, 20, 100, 300, 500]: print(f"Hurst exponents with {lag} lags ----") for column in df.columns: print(f"{column}: {get_hurst_exponent(df[column].values, lag):.4f}")

**Output :**

Hurst exponents with 5 lags ---- mean_rev: -0.0189 gbm: 0.4450 trending: 0.7725 Hurst exponents with 10 lags ---- mean_rev: 0.0085 gbm: 0.4333 trending: 0.8368 Hurst exponents with 20 lags ---- mean_rev: 0.0064 gbm: 0.4539 trending: 0.8715 Hurst exponents with 100 lags ---- mean_rev: -0.0021 gbm: 0.5401 trending: 0.8442 Hurst exponents with 300 lags ---- mean_rev: 0.0002 gbm: 0.5691 trending: 0.7463 Hurst exponents with 500 lags ---- mean_rev: 0.0015 gbm: 0.4854 trending: 0.6662

The Conclusion that we can find for the mean-reverting series is that it is the most consistent over all the values of the lag. The negative values might be due to some approximations but ideally, it should not cross much below 0. For the random walk one, we can clearly see that the value oscillates between the desired 0.5 value. For the last but not least one, we can see that the trending series can be correctly identified by the Hurst exponent for most of the selected lags but as we increase it, the value of exponent decreases to 0.5 and that would indicate a random walk.

Our results showed Google is following this trend which can be seen from the chart too that the price is oscillating around its mean price and that is the trend it is following and hence our final deduction from this can be that the results are very close enough to our expectations but the maximum number of lags that we consider can change our conclusion. The given analysis shows that a series cannot be purely trending or mean-reverting alone. Another factor here is whether we are looking at it in a long time frame or short (We do that by looking at the different range of lags). For larger time frames we get a more accurate result which is ‘**Mean-reversion**‘

**Key takeaways from the results :**

We can deduce the following statements from the above results:

- Hurst exponent is a measure of the memory in any time series which is used to classify the time series as trending, mean-reverting, or a random walk.
- By changing the lag parameter we can receive different results on whether we want to look at the time series on a short-term or long-term basis. The results will differ accordingly
- The graph is always trying to maintain the method of Mea reversion which is often followed by most of the stocks and cryptocurrency but at shorter time intervals it might become trending also.

**Endnotes**

*Thank you for reading till the end. Hope you are doing well and stay safe and are getting vaccinated soon or already are.*

Arnab Mondal

Data Engineer & Python Developer | Freelance Tech Writer

Follow me on LinkedIn and feel free to ask me about any doubts you have