Shivani Sharma — September 4, 2021

This article was published as a part of the Data Science Blogathon

This is a tutorial on how to create a deep learning model for predicting stock prices using the TensorFlow framework. This is an advanced project of Tensorflow which means you should be very clear with the basics of Stock Prices. You can also check one of my favourite articles for the same.

## Stock Price Data import and preparation

Heinz exported stock data to a CSV file. Its dataset contained n = 41,266 minutes of data covering 500 stocks traded from April to August 2017, as well as information on the price of the S&P 500.

# Data import

# Resetting the date variable
data = data.drop (['DATE'], 1)
# Dataset dimension
n = data.shape [0]
p = data.shape [1]
# Forming data into a numpy array
data = data.values
This is the S&P time series plotted with pyplot.plot (data ['SP500']):

Image 1

Interesting point: Since the ultimate goal is to “predict” the value of the index in the near future, it moves one minute ahead.

## Preparing data for testing and training

The dataset was split into two, one for testing and one for training. At the same time, data for training accounted for 80% of their total volume and covered the period from April to approximately the end of July 2017, data for testing ended in August 2017.

# Data for testing and training
train_start = 0
train_end = int(np.floor(0.8*n))
test_start = train_end
test_end = n
data_train = data[np.arange(train_start, train_end), :]
data_test = data[np.arange(test_start, test_end), :]

There are many approaches to time series cross-validation, from generating forecasts with or without refitting to more complex concepts like bootstrap time series resampling. In the latter case, the data is split into repeated samples starting from the beginning of the seasonal decomposition of the time series – this allows simulating samples that follow the same seasonal pattern as the original time series, but do not completely copy its values.

## Data scaling

Most neural network architectures use input (and sometimes output) scaling. The reason is that most neuron activation functions like sigmoid or hyperbolic tangent are defined at intervals [-1, 1] or [0, 1], respectively. Currently, rectified linear unit (ReLU) activations are most commonly used. Heinz decided to scale the inputs and targets by using MinMaxScaler in Python for this:

# Data scaling

from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler ()
scaler.fit (data_train)
data_train = scaler.transform (data_train)
data_test = scaler.transform (data_test)
# Plotting X and y
X_train = data_train[:, 1:]
y_train = data_train[:, 0]
X_test = data_test[:, 1:]
y_test = data_test[:, 0]

Note: You should be careful when choosing the data part and the time for scaling. A common mistake here is to scale the entire dataset before splitting it into test and training data. This is an error because scaling triggers the calculation of statistics, that is, the minimums/maximums of the variables. When doing time series forecasting in real life, at the time they are generated, you may not have information from future observations. Therefore, the statistics must be calculated on the training data, and then the result obtained is applied to the test data. Taking information “from the future” (that is, from the test sample) to generate predictions, the model will produce predictions with “system bias” (bias).

## Introduction to TensorFlow

TensorFlow is a great product, currently the most popular framework for solving machine learning problems and creating neural networks. The backend of the product is based on C ++, however, Python is usually used for control. TensorFlow uses the concept of graphing computational tasks. This approach allows users to define mathematical operations as elements of data graphs, variables, and operators. Since neural networks are, in fact, graphs of data and mathematical operations, TensorFlow is great for working with them and machine learning. The example below shows a graph that solves the problem of adding two numbers:

Image 2

The picture above shows two numbers that need to be added. In variables a and b, they actually find their place gets recorded. The values ​​travel through the graph and arrive at the node represented by the square, where the addition takes place. The result of the operation is written to another variable c. The variables used can be considered as placeholders. Any numbers that fall into a and b are added, and the result is written to c.

This is the real working of TensorFlow – the user declares an abstract representation of the model through holders and variables. After that, the first ones are filled with real data, and calculations take place. The test case above is described by the following code in TensorFlow:

# Import TensorFlow

import tensorflow as tf
# Defining a and b as placeholders
a = tf.placeholder (dtype = tf.int8)
b = tf.placeholder (dtype = tf.int8)
# Graph initialization
graph = tf.Session ()
# Run graph
graph.run (c, feed_dict = {a: 5, b: 4})

After importing the TensorFlow library using tf.placeholder (), two placeholders are defined. They correspond to the two blue circles on the left side of the image above. After that, the addition operation is defined using tf.add (). The result of the operation is c = 9. With configured placeholders, the graph can be executed for any integer values ​​of a and b. It is clear that this example is extremely simple, and neural networks in real life are much more complicated, but it allows you to understand the principles of the framework.

## Placeholders

As stated above, it all starts with placeholders. In order to implement the model, you need two such elements: X contains the input data for the network (stock prices of all S&P 500 elements at time T = t) and output data Y (the value of the S&P 500 index at time T = t + 1).

The shape of placeholders is like [None, n_stocks], in which the word [None] means the input is a 2-D matrix and the output is a 1-D vector. It is important to understand what form of input and output data the neural network needs and organize them accordingly.

#Placeholder

X = tf.placeholder(dtype=tf.float32, shape=[None, n_stocks])
Y = tf.placeholder(dtype=tf.float32, shape=[None])

The None argument means that at this point we do not yet know the number of observations that will pass through the neural network graph during each run, so it remains flexible. Later, the batch_size variable will be defined, which controls the number of observations during the training run.

## Variables

In addition to placeholders, there is another important element in the TensorFlow universe – variables. If the use of placeholders is to store input and expected data in a graph format, then variables act as flexible containers within the graph. They are allowed to change during the execution of the graph. Weights and biases are presented as variables in order to facilitate adaptation during training. Variables must be initialized before starting training.

The model consists of four hidden levels. The first contains 1,024 neurons, which is slightly more than twice the size of the input data. Subsequent hidden levels are always half the size of the previous one – they combine 512, 256, and 128 neurons. Reducing the number of neurons at each level compresses the information that the network processed at the previous levels. There are other neuron architectures and configurations, but this tutorial uses this model:

# Model architecture parameters

n_stocks = 500
n_neurons_1 = 1024
n_neurons_2 = 512
n_neurons_3 = 256
n_neurons_4 = 128
n_target = 1
# Level 1: Variables for hidden weights and biases
W_hidden_1 = tf.Variable (weight_initializer ([n_stocks, n_neurons_1]))
bias_hidden_1 = tf.Variable (bias_initializer ([n_neurons_1]))
# Level 2: Variables for hidden weights and biases
W_hidden_2 = tf.Variable (weight_initializer ([n_neurons_1, n_neurons_2]))
bias_hidden_2 = tf.Variable (bias_initializer ([n_neurons_2]))
# Level 3: Variables for hidden weights and biases
W_hidden_3 = tf.Variable (weight_initializer ([n_neurons_2, n_neurons_3]))
bias_hidden_3 = tf.Variable (bias_initializer ([n_neurons_3]))
# Level 4: Variables for hidden weights and biases
W_hidden_4 = tf.Variable (weight_initializer ([n_neurons_3, n_neurons_4]))
bias_hidden_4 = tf.Variable (bias_initializer ([n_neurons_4]))
# Output level: Variables for hidden weights and biases
W_out = tf.Variable (weight_initializer ([n_neurons_4, n_target]))
bias_out = tf.Variable (bias_initializer ([n_target]))

It is important to understand what variable sizes are required for different levels. As a rule of thumb for multilevel perceptrons, the size of the previous level is the first size of the current level for the weight matrices. It sounds complicated, but the bottom line is that each layer passes its output as input to the next layer. The displacement sizes are equal to the second size of the weight matrix of the current level, which corresponds to the number of neurons in the level.

## Network architecture development

After determining the required weights and biases of variables, network topology, it is necessary to determine the architecture of the network. Thus, data as placeholders and weights and biases as variables need to be combined into a system of sequential matrix multiplications. Activation functions are responsible for the transformation of hidden layers. These functions are important elements of the network infrastructure because they introduce non-linearity into the system. There are dozens of activation functions, and one of the most common is the rectified linear unit (ReLU). This guide uses it:

# Hidden level

# Output level (must be transposed)

The image below illustrates the architecture of the network. The model consists of three main blocks. Input data level, hidden levels, and output level. This infrastructure is called the feed-forward network. This means that chunks of data move strictly from left to right in the structure. In other implementations, for example, in the case of recurrent neural networks, data can flow inside the network in different directions.

Image 3

## Cost function

The use of the network cost function is to generate an estimate of the deviation between network predictions and actual observations at the time of training. To solve regression problems, the mean squared error (MSE) function is used. This function calculates the standard deviation between predictions and targets, but in general, any differentiable function can be used to calculate the deviation between.

# Cost function

mse = tf.reduce_mean(tf.squared_difference(out, Y))

In doing so, MSE displays specific entities that are useful for solving a general optimization problem.

## Optimizer

The optimizer takes care of the necessary calculations required to adapt the weights and variable deviations of the neural network during training. These calculations lead to the calculation of the so-called gradients, which indicate the direction of the necessary changes in the deviations and weights to minimize the cost function. The development of a stable and fast optimizer is one of the main tasks of the creators of neural networks.

# Optimizer

In this case, one of the most common machine learning optimizers, Adam Optimizer, is used. Adam is an acronym for Adaptive Moment Estimation and is a cross between the other two popular optimizers AdaGrad and RMSProp.

## Initializers

Initializers are used to initialize variables before starting training. Since neural networks are trained using numerical optimization techniques, the starting point of an optimization problem is one of the most important factors in finding a good solution. There are various initializers in TensorFlow, each of which takes a different approach. This tutorial uses tf.variance_scaling_initializer (), which implements one of the standard initialization strategies.

# Initializers

sigma = 1
weight_initializer = tf.variance_scaling_initializer(mode="fan_avg", distribution="uniform", scale=sigma)
bias_initializer = tf.zeros_initializer()

Note: TensorFlow can define multiple initialization functions for different variables within the graph. However, in most cases, uniform initialization is sufficient.

## Setting up a neural network

Finally, the model needs to be trained, and this is usually done using a mini-batch training approach. During such training, random data samples of size n = batch_size are selected from the training dataset and loaded into the neural network. The training dataset is divided into n / batch_size chunks, which are then sequentially sent to the network. At this point, placeholders X and Y come into play. They store the input and target data and send it to the neural network.

Sampled data X travels through the network until it reaches the output level. In the current “run” TensorFlow compares the model-generated predictions with the actually observed one’s Y targets. After that, TensorFlow performs the optimization stage and updates the network parameters, after updating the weights and deviations, the process is repeated again for a new piece of data. The procedure is repeated until all the “sliced” pieces of data are sent to the neural network. The complete cycle of such processing is called an “epoch”.

The network training stops when the maximum number of epochs is reached or when another predefined stopping criterion is triggered.

# Create session

net = tf.Session ()
# Running the initializer
net.run (tf.global_variables_initializer ())
# Setting up an interactive chart
plt.ion ()
fig = plt.figure()
line1, = ax1.plot(y_test)
line2, = ax1.plot(y_test*0.5)
plt.show()

# The number of epochs and the size of the data chunk

epochs = 10
batch_size = 256
for e in range(epochs):
# Shuffling data for training
shuffle_indices = np.random.permutation(np.arange(len(y_train)))
X_train = X_train[shuffle_indices]
y_train = y_train[shuffle_indices]

# Learning by mini-batch

for i in range(0, len(y_tr) // batch_size):
start = i * batch_size
batch_x = X_train[start:start + batch_size]
batch_y = y_train[start:start + batch_size]
# Run optimizer with batch
net.run(opt, feed_dict={X: batch_x, Y: batch_y})

# Show progress

if np.mod(i, 5) == 0:
# Prediction
pred = net.run(out, feed_dict={X: X_test})
line2.set_ydata(pred)
plt.tit('Epoch ' + str(e) + ', Batch ' + str(i))
f_name = 'img' + str(e) + '_batch_' + str(i) + '.jpg'
plt.savefig(file_name)
plt.pause(0.01)

# Output the final MSE function after training

mse_final = net.run(mse, feed_dict={X: X_test, Y: y_test})
print(mse_final)

In the course of training, the predictions generated by the network on the test set were evaluated, then visualization was carried out. In addition, the images were uploaded to disk, and later a video animation of the learning process was created from them:

As you can see, the neural network quickly adapts to the basic form of the time series and continues to search for the best data patterns. After 10 epochs have passed, we get results that are very close to the test data. The final value of the MSE function is 0.00078 (a very small value due to the targets being scaled). The average absolute percentage forecast error on the test set is 5.31% – a very good result. It is important to understand that this is just a coincidence with test data, not real data.

Image 4

Scatter plot between predicted and real prices of the S&P

## CONCLUSION

This result can be further improved in many ways, from working out the levels and neurons to choosing other schemes of initialization and activation. In addition, various types of deep learning models such as recurrent neural networks can be used – this can also lead to better results.

References

Image 1 – https://www.programmersought.com/images/67/c6bc6001c81422682c8e76284365ef73.JPEG

Image 2 – https://www.programmersought.com/images/67/c6bc6001c81422682c8e76284365ef73.JPEG

Image 3 – https://www.programmersought.com/images/67/c6bc6001c81422682c8e76284365ef73.JPEG

Image 4 – https://www.programmersought.com/images/67/c6bc6001c81422682c8e76284365ef73.JPEG