Forecasting Financial Time Series – A Model of MLP in Keras
This article was published as a part of the Data Science Blogathon
Introduction
The purpose of this article is to show the process of working with time series from data processing to building neural networks and validating the results. As an example, financial series was chosen as completely random and in general, it is interesting if conventional neural network architectures can capture the necessary patterns to predict the behavior of a financial instrument.
The pipeline described in this article can be easily applied to any other data and to other classification algorithms.
Data preparation For Financial Time Series Forecasting
For example, take the stock prices of a humble company like Apple from 2005 to the present day. They can be downloaded from Yahoo Finance in .csv format. Let’s load the data and see what all this beauty looks like.
First, let’s import the libraries we need to download:
import matplotlib.pylab as plt import numpy as np import pandas as pd
Now, just read the data to draw the graphs and don’t forget to flip the data using [:: – 1] as the data of CSV is in reverse order i.e. from 2017 to 2005:
data = pd.read_csv('./data/AAPL.csv')[::1] close_price = data.ix[:, 'Adj Close'].tolist() plt.plot(close_price) plt.show()
It looks almost like a typical random process, but we will try to solve the problem of forecasting a day or more ahead. The problem of “forecasting” must first be described closer to the problems of machine learning. We can simply predict the movement of the stock price in the market – more or less – this will be a binary classification problem. On the other hand, we can predict either just the price values on the next day (or in a couple of days) or the price change on the next day compared to the last day, or the logarithm of this difference – that is, we want to predict a number, which is a problem regression. But when solving the regression problem, you will have to face the problems of data normalization, which we will now consider.
Whether in the case of classification, or in the case of regression, we will take some kind of time series window (for example, 30 days) as an entry and try to either predict the price movement on the next day (classification), or the value of the change (regression).
The main problem with financial time series is that they are not at all stationary, that is, their characteristics are like math. expectation, variance, average maximum and minimum values in the window change over time, which me
Image 1
and that in an amicable way we cannot use these values for the MinMax or zscore of normalization according to our windows, since if in 30 days in our window we have some characteristics, but they can change the next day or change in the middle of our window.
But if you look closely at the classification problem, we are
Dat = [(np.array(x)  np.mean(x)) / np.std(x) for x in Dat]
For the regression problem, this will not work, because if we also subtract the average and divide by the deviation, we will have to restore this value for the price value the next day, and there these parameters may be completely different. Therefore, we will try two options: train on raw data and try to trick the system by taking not so interested inmates. expectation or variance on the next day, we are only interested in moving up or down. Therefore, we will take a risk and will normalize our 30day windows using the zscore, but only them, without affecting anything from the “future”:
g a percentage change in price the next day – pandas will help us with this:
close_price_diffs = close.price.pct_change()
It looks like this, and as we can see, these data, obtained without any manipulations with statistical characteristics, already lie in the limit from 0.5 to 0.5:
To divide into training and training samples, we take the first 85% of the windows in time for training and the last 15% for checking the operation of the neural network.
So, to train our neural network, we will receive the following pairs X, Y: prices at the close of the market for 30 days and [1, 0] or [0, 1], depending on whether the price value for the binary classification has increased or decreased; the percentage change in prices for 30 days and the change the next day for the regression.
Neural network architecture
We will use a multilayer perceptron as a basic model. Let’s take Keras as a framework for implementation – it is very simple, intuitive and you can implement fairly complex computational graphs on your knees with it, but so far we won’t need it. A basic grid is implemented with 30 neurons of the input layer, 64 neurons(1st hidden layer), after that Batch Normalization – it is recommended to use it for almost any multilayer networks, then the activation function ( ReLU is already considered not comme il faut, so let’s take something fashionable like LeakyReLU). At the output, we place one neuron (or two for classification), which, depending on the task (classification or regression), will either have a softmax at the output or leave it without nonlinearity in order to be able to predict any value.
The classification code looks like this:
model = Sequential() model.add(Dense(64, input_dim=30)) model.add(BatchNormalization()) model.add(LeakyReLU()) model.add(Dense(2)) model.add(Activation('softmax'))
For a regression problem, the activation parameter should be ‘linear’ at the end. Next, we need to define the error functions and the optimization algorithm. Without going into the details of gradient descent variations, let’s take Adam with a step length of 0.001; the loss parameter for classification needs to be set to the crossentropy – ‘categorical_crossentropy’, and for regression – the mean square error – ‘mse’. Keras also allows us to control the training process quite flexibly, for example, it is good practice to reduce the value of the gradient descent step if our results do not improve – this is exactly what Reduce LR On Plateau does, which we added as a callback to model training.
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.9, patience=5, min_lr=0.000001, verbose=1) model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy']) Neural network training history = model.fit(X_train, Y_train, nb_epoch = 50, batch_size = 128, verbose=1, validation_data=(X_test, Y_test), shuffle=True, callbacks=[reduce_lr])
After the learning process is complete, it would be nice to display the graphs of the dynamics of the error and accuracy values on the screen:
plt.figure() plt.plot(history.history['loss']) plt.plot(history.history['val_loss']) plt.title('model loss') plt.ylabel('loss') plt.xlabel('epoch') plt.legend(['train', 'test'], loc='best') plt.show() plt.figure() plt.plot(history.history['acc']) plt.plot(history.history['val_acc']) plt.title('model accuracy') plt.ylabel(‘acc’) plt.xlabel('epoch') plt.legend(['train', 'test'], loc='best') plt.show()
Before starting training, I want to draw your attention to an important point: it is necessary to learn algorithms on such data longer, at least 50100 epochs. This is due to the fact that if you train for, say, 510 epochs and see 55% accuracy, this most likely does not mean that you have learned to find patterns if you analyze the training data, you will see that just 55% windows were for one pattern (increase, for example), and the remaining 45% were for another (decrease). In our case, 53% of the windows are of the “decrease” class, and 47% are of “increase”, so we will try to get an accuracy higher than 53%, which will indicate that we have learned to find signs.
Too high accuracy on raw data such as the closing price and simple algorithms will most likely indicate overfitting or “looking” into the future when preparing a training sample.
Forecasting Financial Time Series Classification problem
Let’s train our first model and look at the graphs:
As you can see, the error, that the accuracy for the test sample all the time remains at plus or minus one value, and the error for the training sample falls, and the accuracy increases, which tells us about overfitting. Let’s look at the deeper model of two layers:
model = Sequential() model.add(Dense(64, input_dim=30)) model.add(BatchNormalization()) model.add(LeakyReLU()) model.add(Dense(16)) model.add(BatchNormalization()) model.add(LeakyReLU()) model.add(Dense(2)) model.add(Activation('softmax'))
Here are the results of her work:
Approximately the same picture. When we are faced with the overfitting effect, we need to add regularization to our model. In the process of regularization, we impose certain restrictions on the weights of the neural network so that there is not a large scatter in the values and, despite a large number of parameters (i.e., network weights), some of them are turned to zero for simplicity. We will start with the most common way – adding an additional term to the error function with the L2 norm for the sum of weights, in Keras, this is done using keras.regularizers.activity_regularizer.
model = Sequential() model.add(Dense(64, input_dim=30, activity_regularizer=regularizers.l2(0.01))) model.add(BatchNormalization()) model.add(LeakyReLU()) model.add(Dense(16, activity_regularizer=regularizers.l2(0.01))) model.add(BatchNormalization()) model.add(LeakyReLU()) model.add(Dense(2)) model.add(Activation('softmax'))
This neural mesh learns a little better in terms of the error function, but accuracy still suffers:
Such a strange effect as a decrease in error, but not a decrease in accuracy, is often encountered when working with data of a large noisy or random nature – this is due to the fact that the error is calculated based on the crossentropy value, which can decrease while the accuracy is the index of the neuron with the correct answer, which, even if the error changes, may remain incorrect.
Therefore, it is worth adding even more regularization to our model using the Dropout technique popular in recent years – roughly speaking, this is a random “ignoring” of some weights in the learning process in order to avoid coadaptation of neurons (so that they do not learn the same features). The code looks like this:
model = Sequential() model.add(Dense(64, input_dim=30, activity_regularizer=regularizers.l2(0.01))) model.add(BatchNormalization()) model.add(LeakyReLU()) model.add(Dropout(0.5)) model.add(Dense(16, activity_regularizer=regularizers.l2(0.01))) model.add(BatchNormalization()) model.add(LeakyReLU()) model.add(Dense(2)) model.add(Activation('softmax'))
As you can see, between the two hidden layers we will “drop” connections during training with a 50% probability for each weight. A dropout is usually not added between the input layer and the first hidden one, since in this case we will learn from simply noisy data, and it is also not added right before the output. During network testing, of course, no dropout occurs. How such a grid learns:
If you stop training the network a little earlier, we can get 58% accuracy in predicting the price movement, which is certainly better than random guessing.
Another interesting and intuitive moment of forecasting financial time series is that the fluctuation on the next day is of a random nature, but when we look at charts, candles, we can still notice the trend for the next 510 days. Let’s check if our neuron can cope with such a task – we will predict the price movement in 5 days with the last successful architecture and, for the sake of interest, we will train on more epochs:
As you can see, if we stop training early enough (overtime, overfitting still occurs), then we can get 60% accuracy, which is very good.
Forecasting Financial Time Series – Regression problem
For the regression problem, let’s take our last successful classification architecture (it has already shown that it can learn the necessary features), remove Dropout, and train at more iterations.
Also, in this case, we can look not only at the error value but also visually assess the forecasting quality using the following code:
pred = model.predict(np.array(X_test)) original = Y_test predicted = pred plt.plot(original, color='black', label = 'Original data') plt.plot(predicted, color='blue', label = 'Predicted data') plt.legend(loc='best') plt.title('Actual and predicted') plt.show() The network architecture will look like this: model = Sequential() model.add(Dense(64, input_dim=30, activity_regularizer=regularizers.l2(0.01))) model.add(BatchNormalization()) model.add(LeakyReLU()) model.add(Dense(16, activity_regularizer=regularizers.l2(0.01))) model.add(BatchNormalization()) model.add(LeakyReLU()) model.add(Dense(1)) model.add(Activation('linear'))
Let’s see what happens if we train on a “raw” adjustment close:
It looks good from a distance, but if we look closely, we will see that our neural network is simply lagging behind in its predictions, which can be considered a failure.
For price changes, the result is:
Some values are predicted well, in some places the trend is correctly guessed, but in general – soso.
Discussion
In principle, the results are usually not impressive at 1st glance. It is, but we trained the simplest kind of neural network on onedimensional data without much preprocessing. There are a number of steps that allow you to bring the accuracy to the level of 6070%:

Use not only the closing price, but all data from our .csv (high, low, open, close, volume) – that is, pay attention to all available information at any given time

Optimize the hyperparameters – the window size, the number of neurons in the hidden layers, the training step – all these parameters were taken at random, using a random search, you can find out that, perhaps, we need to look 45 days ago and learn a deeper mesh with a smaller step.

Use loss functions that are more suitable for our task (for example, to predict price changes, we could find the neural one for an incorrect sign, the usual MSE is invariant to the sign of the number)
Conclusion
In this article, we have applied the simplest neural network architecture to predict price movements in the market. This pipeline can be used for any time series, the main thing is to choose the right data preprocessing, determine the network architecture, and evaluate the quality of the algorithm. In our case, we managed to predict the trend in 5 days with 60% accuracy using the price window in the previous 30 days, which can be considered a good result. The quantitative prediction of price changes turned out to be a failure, for this task it is advisable to use more serious tools and statistical analysis of the time series.
References
 Image 1 https://userimages.githubusercontent.com/7363923/55100049d76d9b8050b811e99c3db0be21346e41.png
 Image 2 https://www.google.com/url?sa=i&url=https%3A%2F%2Fwww.mdpi.com%2F10994300%2F22%2F10%2F1141%2Fpdf&psig=AOvVaw2C9by1IHXZEVmC4BDXTPsK&ust=1630666711145000&source=images&cd=vfe&ved=0CAsQjRxqFwoTCMCw9YKR4PICFQAAAAAdAAAAABAI