Predicting Stock Prices using Reinforcement Learning (with Python Code!)

Ekta Shah 23 Dec, 2020
7 min read

This article was published as a part of the Data Science Blogathon.


The share price of HDFC Bank is going up. It’s on an increasing trend. People are selling in higher numbers and making some instant money.

These are sentences we hear about the stock market on a regular basis nowadays. You can replace HDFC with any other stock that thrived during a tumultuous 2020 and the narrative remains pretty similar.

The stock market is an interesting medium to earn and invest money. It is also a lucrative option that increases your greed and leads to drastic decisions. This is majorly due to the volatile nature of the market. It is a gamble that can often lead to a profit or a loss. There is no proper prediction model for stock prices. The price movement is highly influenced by the demand and supply ratio.

stock price reinforcement learning image

In this article, we will try to mitigate that through the use of reinforcement learning. We will go through the reinfrocement learning techniques that have been used for stock market prediction.


Techniques We Can Use for Predicting Stock Prices

As it is a prediction of continuous values, any kind of regression technique can be used:

  • Linear regression will help you predict continuous values
  • Time series models are models that can be used for time-related data
  • ARIMA is one such model that is used for predicting futuristic time-related predictions
  • LSTM is also one such technique that has been used for stock price predictions. LSTM refers to Long Short Term Memory and makes use of neural networks for predicting continuous values. LSTMs are very powerful and are known for retaining long term memory

However, there is another technique that can be used for stock price predictions which is reinforcement learning.

Stock Price Reinforcement Learning google stock


What is Reinforcement Learning?

Reinforcement learning is another type of machine learning besides supervised and unsupervised learning. This is an agent-based learning system where the agent takes actions in an environment where the goal is to maximize the record. Reinforcement learning does not require the usage of labeled data like supervised learning.

Reinforcement learning works very well with less historical data. It makes use of the value function and calculates it on the basis of the policy that is decided for that action.

Reinforcement learning is modeled as a Markov Decision Process (MDP):

  • An Environment E and agent states S

  • A set of actions A taken by the agent

  • P(s,s’)=>P(st+1=s’|st=s,at=a) is the transition probability from one state s to s’

  • R(s,s’) – Immediate reward for any action


How can we predict stock market prices using reinforcement learning?

The concept of reinforcement learning can be applied to the stock price prediction for a specific stock as it uses the same fundamentals of requiring lesser historical data, working in an agent-based system to predict higher returns based on the current environment. We will see an example of stock price prediction for a certain stock by following the reinforcement learning model. It makes use of the concept of Q learning explained further.

Steps for designing a reinforcement learning model is –

  • Importing Libraries
  • Create the agent who will make all decisions
  • Define basic functions for formatting the values, sigmoid function, reading the data file, etc
  • Train the agent
  • Evaluate the agent performance


Define the Reinforcement Learning Environment

MDP for Stock Price Prediction:

  • Agent – An Agent A that works in Environment E
  • Action – Buy/Sell/Hold
  • States – Data values
  • Rewards – Profit / Loss
agent environment  Stock Price Reinforcement Learning


The Role of Q – Learning

Q-learning is a model-free reinforcement learning algorithm to learn the quality of actions telling an agent what action to take under what circumstances. Q-learning finds an optimal policy in the sense of maximizing the expected value of the total reward over any successive steps, starting from the current state.


Obtaining Data

  1. Go to Yahoo Finance

  2. Type in the company’s name for eg. HDFC Bank

  3. Select the time period for e.g. 5 years

  4. Click on Download to download the CSV file

Predicting Stock Price Reinforcement Learning data


Let’s Implement Our Model in Python

Importing Libraries

To build the reinforcement learning model, import the required python libraries for modeling the neural network layers and the NumPy library for some basic operations.

import keras
from keras.models import Sequential
from keras.models import load_model
from keras.layers import Dense
from keras.optimizers import Adam
import math
import numpy as np
import random
from collections import deque

Creating the Agent

The Agent code begins with some basic initializations for the various parameters. Some static variables like gamma, epsilon, epsilon_min, and epsilon_decay are defined. These are threshold constant values that are used to drive the entire buying and selling process for stock and keep the parameters in stride. These min and decay values serve like threshold values in the normal distribution.

The agent designs the layered neural network model to take action of either buy, sell, or hold. This kind of action it takes by looking at its previous prediction and also the current environment state. The act method is used to predict the next action to be taken. If the memory gets full, there is another method called expReplay designed to reset the memory.

Class Agent:

    def __init__(self, state_size, is_eval=False, model_name=""):
        self.state_size = state_size # normalized previous days
        self.action_size = 3 # sit, buy, sell
        self.memory = deque(maxlen=1000)
        self.inventory = []
        self.model_name = model_name
        self.is_eval = is_eval
        self.gamma = 0.95
        self.epsilon = 1.0
        self.epsilon_min = 0.01
        self.epsilon_decay = 0.995
        self.model = load_model(model_name) if is_eval else self._model()
    def _model(self):
        model = Sequential()
        model.add(Dense(units=64, input_dim=self.state_size, activation="relu"))
        model.add(Dense(units=32, activation="relu"))
        model.add(Dense(units=8, activation="relu"))
        model.add(Dense(self.action_size, activation="linear"))
        model.compile(loss="mse", optimizer=Adam(lr=0.001))
        return model
    def act(self, state):
        if not self.is_eval and random.random()<= self.epsilon:
            return random.randrange(self.action_size)
        options = self.model.predict(state)
        return np.argmax(options[0])
    def expReplay(self, batch_size):
        mini_batch = []
        l = len(self.memory)
        for i in range(l - batch_size + 1, l):
        for state, action, reward, next_state, done in mini_batch:
            target = reward
            if not done:
                target = reward + self.gamma * np.amax(self.model.predict(next_state)[0])
            target_f = self.model.predict(state)
            target_f[0][action] = target
  , target_f, epochs=1, verbose=0)
        if self.epsilon > self.epsilon_min:
            self.epsilon *= self.epsilon_decay


Define Basic Functions

The formatprice() is written to structure the format of the currency. The getStockDataVec() will bring the stock data into python. Define the sigmoid function as a mathematical calculation. The getState() is coded in such a manner that it gives the current state of the data.

def formatPrice(n):
    return("-Rs." if n<0 else "Rs.")+"{0:.2f}".format(abs(n))
def getStockDataVec(key):
    vec = []
    lines = open(key+".csv","r").read().splitlines()
    for line in lines[1:]:
    return vec 
def sigmoid(x):
    return 1/(1+math.exp(-x))
def getState(data, t, n):
    d = t - n + 1
    block = data[d:t + 1] if d >= 0 else -d * [data[0]] + data[0:t + 1] # pad with t0
    res = []
    for i in range(n - 1):
        res.append(sigmoid(block[i + 1] - block[i]))
    return np.array([res])

Training the Agent

Depending on the action that is predicted by the model, the buy/sell call adds or subtracts money. It trains via multiple episodes which are the same as epochs in deep learning. The model is then saved subsequently.

import sys
stock_name = input("Enter stock_name, window_size, Episode_count")
window_size = input()
episode_count = input()
stock_name = str(stock_name)
window_size = int(window_size)
episode_count = int(episode_count)
agent = Agent(window_size)
data = getStockDataVec(stock_name)
l = len(data) - 1
batch_size = 32
for e in range(episode_count + 1):
    print("Episode " + str(e) + "/" + str(episode_count))
    state = getState(data, 0, window_size + 1)
    total_profit = 0
    agent.inventory = []
    for t in range(l):
        action = agent.act(state)
        # sit
        next_state = getState(data, t + 1, window_size + 1)
        reward = 0
        if action == 1: # buy
            print("Buy: " + formatPrice(data[t]))
        elif action == 2 and len(agent.inventory) > 0: # sell
            bought_price = window_size_price = agent.inventory.pop(0)
            reward = max(data[t] - bought_price, 0)
            total_profit += data[t] - bought_price
            print("Sell: " + formatPrice(data[t]) + " | Profit: " + formatPrice(data[t] - bought_price))
        done = True if t == l - 1 else False
        agent.memory.append((state, action, reward, next_state, done))
        state = next_state
        if done:
            print("Total Profit: " + formatPrice(total_profit))
        if len(agent.memory) > batch_size:
    if e % 10 == 0:

Training Output at the end of the first episode:

Total Profit: Rs.340.03


training output diagram
The feedback i.e. reward is given to the Agent for further processing


Evaluation of the model

Once the model has been trained depending on new data, you will be able to test the model for the profit/loss that the model is giving. You can accordingly evaluate the credibility of the model.

stock_name = input("Enter Stock_name, Model_name")
model_name = input()
model = load_model(model_name)
window_size = model.layers[0].input.shape.as_list()[1]
agent = Agent(window_size, True, model_name)
data = getStockDataVec(stock_name)
l = len(data) - 1
batch_size = 32
state = getState(data, 0, window_size + 1)
total_profit = 0
agent.inventory = []
for t in range(l):
    action = agent.act(state)
    # sit
    next_state = getState(data, t + 1, window_size + 1)
    reward = 0
    if action == 1: # buy
        print("Buy: " + formatPrice(data[t]))
    elif action == 2 and len(agent.inventory) > 0: # sell
        bought_price = agent.inventory.pop(0)
        reward = max(data[t] - bought_price, 0)
        total_profit += data[t] - bought_price
        print("Sell: " + formatPrice(data[t]) + " | Profit: " + formatPrice(data[t] - bought_price))
    done = True if t == l - 1 else False
    agent.memory.append((state, action, reward, next_state, done))
    state = next_state
    if done:
        print(stock_name + " Total Profit: " + formatPrice(total_profit))
        print ("Total profit is:",formatPrice(total_profit))


End Notes

Reinforcement learning gives positive results for stock predictions. By using Q learning, different experiments can be performed. More research in reinforcement learning will enable the application of reinforcement learning at a more confident stage.

You can reach out to

Ekta Shah 23 Dec, 2020

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers


Claudio 30 Oct, 2020

Hi there, very interested to know more, I am having troubles with the execution of the above code, do u have a more direct alternative to explain your code structure to deliver the outcome, I am trying to understand how the code structure works as I have an error about the name agent "is not defined" how can I get around that which is towards the end of your code for execution

Baby Bear
Baby Bear 30 Oct, 2020

Hi, this is a very good introductory post. But can I ask for any related academic papers or blogs where details are disclosed? I want to know details since I am not very familiar with deep RL Many thanks.

Ciaran 10 Nov, 2020

Very good model, can you tell me what parameters you used during training for the window_size and Episode_count?

Sami 27 Dec, 2020

Hi, Thanks for your exciting code. In the Evaluation of the model (model_name = input()), What should we give to the input? I got this error when i enter "Sequential": OSError: SavedModel file does not exist at: Sequential/{saved_model.pbtxt|saved_model.pb}

Raj 14 Jan, 2021

I dont get where did you use Q learning in the implementation python code. Can you please point me out?

José 04 Feb, 2021

Hello, I don't have much experience with python, I would like to know if you have the sample files to download?

Luka Savic
Luka Savic 09 Feb, 2021

Hey, great post! One question though - what is the dataset you are using with this algo? Yahoo Finance maybe?

BT 14 Feb, 2021

Ekta, this is a wonderful case study. I have taken the code and converted it to trade on live market prices while also making some updates. It's currently mock trading ETC crypto and doing quite well with a few adjustments. The only thing I would note is that your HTML above doesn't include "def class agent:" so that may cause some confusion for python newbies copy and pasting. Feel free to reach out if you'd like to see my changes!

Stephen Hobbs
Stephen Hobbs 26 Feb, 2021

Hi - I'd like to see if you are available for a consulting project. Thank you.

Jacky 03 Mar, 2021

Hi, thanks for the post. I am new to reinforement learning, so now trying to understand the codes and how it works. May I ask what is Window Size and Episode? adjusting these input would affect the profit rate? Cheers Jacky

Perera 15 Mar, 2021

HI, I get an error like below ValueError Traceback (most recent call last) in () 7 episode_count = int(episode_count) 8 agent = Agent(window_size) ----> 9 data = getStockDataVec(stock_name) 10 l = len(data) - 1 11 batch_size = 32 in getStockDataVec(key) 7 #print(line) 8 #print(float(line.split(",")[4])) ----> 9 vec.append(float(line.split(",")[4])) 10 #print(vec) 11 return vec ValueError: could not convert string to float: 'null'

Thomas 16 Mar, 2021

Hi, Very nice article! How did you make the graph that was used at the start of the article where the predicted stock prices and the real ones are compared? Thank you in advance

Leo 20 Mar, 2021

Hi, please can you explain me how can I add the data for this part ? stock_name = input("Enter Stock_name, Model_name") stock_name = input("Enter Stock_name, Model_name") Thank you

Venkat Nalluri
Venkat Nalluri 03 May, 2021

Hi, I want to learn reinforcement learning with the LSTM method. Could you please provide me with a link?

Zoya 09 Nov, 2021

Luca 22 Mar, 2022

Very nice article. Testing the code I received sometimes core dump exception from the expReplay function (the fit model call). I suspect some misallignement in memory allocation to to improper library version combination. May you give some suggestion to stabilize the runs? tks

Hassan 23 Apr, 2022

How we will get predicted values from the model ?

Fizz 19 May, 2022

Really great and helpful article. I have a confusion/Question......... . Why you have only calculated the reward for only "sell" action, but why you set the reward is zero for "sit" and "buy" action ? Please clear this confusion.

khalid 24 Oct, 2022

Hello, what values should I enter for: Enter stock_name, window_size, Episode_count?

Biên 23 Feb, 2023

Enter stock_name, window_size, Episode_count? I don't understand these 3 requirements, please be a bit specific about what I need to input these 3 inputs