*This article was published as a part of the Data Science Blogathon.*

## Introduction

Even though there are myriad complex methods and systems aimed at trying to forecast future stock prices, the simple method of linear regression does help to understand the past trend and is used by professionals as well as beginners to try and extrapolate the existing or past trend into the future. If the existing trend carries on into the future then you could have a potential winner.

A caveat needs to be added: Nothing can determine with any assurance that the future will turn out to be exactly like the past and so this method like other forecasting methods despite being fundamentally useful has its limitations.

With this in mind, let’s try and figure out the future stock prices of Infosys (NSE Symbol: INFY). Let’s recap the concept of linear regression, choose an arbitrary time frame, take the past data, apply the method, identify the past trend, and check the results.

**Understanding linear regression**

The simplest form of the regression equation with one dependent and one independent variable is defined by the formula

**y = m*x + c**

where y is the estimated dependent variable, m is the regression coefficient, or what is commonly called the slope, x is the independent variable and c is a constant. In simple words, y is the output when m, x, and c are used as inputs.

Linear regression does try to predict trends and future values. It essentially, though not with pin-point accuracy can answer questions like,

- What could be the price of Infosys in the next 3 months?
- What could be the price of Gold in the next 6 months?
- Or Where could the Market go if the existing trend continues into the future?

This is as far as future stock prices or the financial markets go. It of course is much more useful and extensively used outside of the financial markets too. However, for this article, we will examine its usefulness in forecasting the stock trend.

**Analyzing the past trend**

Taking the past prices of INFY from 09-Apr-2020 till 10-Jul-2020, a total of 60 trading days or roughly 3 calendar months and by following the below procedure, we will get the past 60-day trend of Infosys.

Take the past 60 days close price data in excel. I’m showing only 3 days of data for illustration purposes. (**Data source**– https://in.finance.yahoo.com/quote/INFY.NS/history?p=INFY.NS)

Symbol | Date | Close |

INFY |
16-Apr-20 |
623.85 |

INFY |
17-Apr-20 |
628.75 |

INFY |
20-Apr-20 |
653.3 |

Select the date and close columns for the 60 values, insert a scatter plot like below.

Select the quick layout as fx as shown below.

To get the linear trend as shown below.

Based on the past values, **excel has calculated the slope, m= 1.3312 which means on average the stock of Infosys has increased by 1.33 Rs. during this period**. The c value which is the constant is a minus 57849.

Let’s look at what the forecasted values look like if we assume this trend continues into the future. Take the next 60 days as the test data in order to compare the forecasts with the actual close date. This will start from 13-Jul-2020 and extend till 05-Oct-2020 (till recently).

Forecasted value, y = 1.3312*x – 57489

Apply the above formula to all the rows of the excel. Remember x is the date here and so you have to convert the result into a number to get the correct result like below.

Next, compare the actual close with the forecasted value and get the percentage difference between these values.

I would not get into the statistical interpretation of the forecasts and for now, only consider how much of the forecast is versus the actual value. i.e. actual close minus forecast divided by the forecast.

Once you do this assignment, you will find that the percentage difference ranged from a minimum of 3% (good forecast) on 14-Jul-2020 to a maximum of 19% on 31-Jul-2020 (not a good one) and tells you the nature of linear trend which is a proportionate change over a period of time.

**Limitations**

- The future is shown as a linear increase which is not true as if you observe the actual values you see a jump up and a fall down in the prices. There is a lot of variability in the actual data but a standard increase in the forecasted values.
- The line goes only one way which means if the past has been of an average increase over a period of time then the future will also show the average increase over the next period. In reality, there is always a danger of trend reversal which the linear forecast can never reveal.
- The values by and of itself differ from the actual by large percentages in excess of 10% which shows it is not such an accurate predictor.

**Usefulness**

- It can tell you on average, what has been the past trend, and like mentioned at the start, if the future trend happens to be the same as the past trend then your prediction will be quite accurate using the simple linear regression model which isn’t a lot of work
- Using the percent error, a range could be deduced and it can be expected the future value could lie within this range.
- Once can work out all the cases and become aware of the future values if the trend improves, worsens, or remains the same and so have an idea of the possible ranges.
- It is a quicker and easier method to apply using the most popular analysis tool which is excel and unlike highly complex processes that try harder at forecasting accurate values is much easier to interpret and explain to non-experts.

**Conclusion**

We knew from the past data the rate of the linear increase, we knew that we do not know whether the future is going to be better than the past or worse than the past or equal to the past.

We simply assumed that it will be equal to the past which is the standard assumption one makes when using the simple linear regression model. However, the actual future data in our example proved that the future was indeed better than the past and if it is so then who in this world would complain.

**Disclaimer:** This is not Investment Advice or a suggested approach to forecast future prices and this article is purely meant for educational purposes. The goal is not to predict future values but examine the usefulness of the linear regression method to extrapolate the past. Any investment or trade made using the technique discussed above should be at the concerned person’s risk.

I have an article where I have successfully used Piecewise Linear regression and optimization techniques for portfolio optimization .

I understand that you want to introduce linear regression, however:

We know that

0) Stock prices cannot be linear in time since prices are positive

1) The deviations from linear model are NOT normally distributed -which is a check every Data Scientist should be doing, but a lot of Data Smiths leave out- it is not even mathematically valid to do linear regression on a linear model of growth in stock prices

2) The long term trend in stock prices is exponential (either growth or decay, look at RocketFuel stock RFI while it was public)

And

3) In fact the deviations from exponential growth are log-normally distributed (Okay, that was lazy. More correctly, the deviations in log(price) space are better modeled as normally distributed.)

So a solution, which is almost as easy, is to take the log(price), model it that as linear growth in time (corresponding to exponential price growth) and do linear regression of the exponential model.