Polynomial Regression for Beginners

Raghav Agrawal 14 Jun, 2024

9 min read

Introduction

In this article, we will study the Polynomial Regression model and implement it using Python on sample data. I hope you are already familiar with Simple Linear Regression Algorithm and multiple polynomial. If not, then please visit our previous article and get a basic understanding of the linear regression model vs. polynomial regression and linear regression because polynomial regression python is derived using the same concept of Linear regression with few modifications to increase accuracy.

Learning Objectives

Explore the concept of polynomial regression in machine learning.
Where and how to use polynomial
Comparison of polynomial and simple linear regression.

This article was published as a part of the Data Science Blogathon.

Why Polynomial Regression?
How Does Polynomial Regression Handle Non-Linear Data?
Why Is Polynomial Regression Called Polynomial Linear Regression?
Linear Regression Vs. Polynomial Regression
Polynomial Regression With One Variable
Playing With a Polynomial Degree
Polynomial Regression With Multiple columns
Frequently Asked Questions

Why Polynomial Regression?

A simple linear regression algorithm only works when the relationship between the data is linear. But suppose we have non-linear data, then linear regression will not be able to draw a best-fit line. Simple regression analysis fails in such conditions. Consider the below diagram, which has a non-linear relationship, and you can see the linear regression results on it, which does not perform well, meaning it does not come close to reality. Hence, we introduce it to overcome this problem, which helps identify the curvilinear relationship between independent and dependent variables.

How Does Polynomial Regression Handle Non-Linear Data?

Polynomial regression is a form of Linear regression where only due to the Non-linear relationship between dependent and independent variables, we add some polynomial terms to linear regression to convert it into Polynomial Regression in Machine Learning.

The relationship between the dependent variable and the independent variable is modeled as an nth-degree polynomial function. When the polynomial is of degree 2, it is called a quadratic model; when the degree of a polynomial is 3, it is called a cubic model, and so on.

Suppose we have a dataset where variable X represents the Independent data and Y is the dependent data. Before feeding data to a mode in the preprocessing stage, we convert the input variables into polynomial terms using some degree.

Consider an example my input value is 35, and the degree of a polynomial is 2, so I will find 35 power 0, 35 power 1, and 35 power 2 this helps to interpret the non-linear relationship in data.
The equation of polynomials becomes something like this.

y = a₀ + a₁x₁ + a₂x₁² + … + a_nx₁ⁿ

The degree of order which to use is a Hyperparameter, and we need to choose it wisely. But using a high degree of polynomial tries to overfit the data, and for smaller values of degree, the model tries to underfit, so we need to find the optimum value of a degree. Polynomial Regression in Machine Learning models are usually fitted with the method of least squares. The least square method minimizes the variance of the coefficients under the Gauss-Markov Theorem.

Why Is Polynomial Regression Called Polynomial Linear Regression?

If you see the equation of polynomial regression python carefully, then we can see that we are trying to estimate the relationship between coefficients and y. And the values of x and y are already given to us, only we need to determine coefficients, and the degree of coefficient here is 1 only, and degree one represents simple linear regression Hence, Polynomial Regression in Machine Learning is also known as Polynomial Linear Regression as it has a polynomial equation and this is only the simple concept behind this. I hope you got the point right.

Linear Regression Vs. Polynomial Regression

Now we know how polynomial regression works and helps to build a model over non-linear data. Let’s compare both algorithms practically and see the results.

First, we will generate the data using some equation ax^2 + bx + c, and then apply simple linear regression to it to form a linear equation. Then we will apply regression polynomial on top of it, which will make an easy comparison between the practical performance of both algorithms.

Initially, we will try it with only one input column and one output column. After having a brief understanding we will try it on high-dimensional data.

Polynomial Regression With One Variable

let’s make your hands dirty with some practical implementations

Step 1: Import all the libraries

import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import r2_score

Step 2: Create and visualize the data
Python Code:

We have added some random noise in the data so that while modeling, it does not overfit it.

Step 3: Split data in the train and test set

x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=2)

Step 4: Apply simple linear regression

Now we will analyze the prediction by fitting simple linear regression. We can see how worse the model is performing, It is not capable of estimating the points.

lr = LinearRegression()
lr.fit(x_train, y_train)
y_pred = lr.predict(x_test)
print(r2_score(y_test, y_pred))

If you see the score, it will be near 15 percent to 20 percent, which is too much. If you plot the prediction line, it will be the same as we saw above, which is not capable of identifying or estimating the best-fit line.

plt.plot(x_train, lr.predict(x_train), color="r")
plt.plot(X, y, "b.")
plt.xlabel("X")
plt.ylabel("Y")
plt.show()

Step 5: Apply polynomial regression

Now we will convert the input to polynomial terms by using the degree as 2 because of the equation we have used, the intercept is 2. while dealing with real-world problems, we choose degree by heat and trial method.

#applying polynomial regression degree 2
poly = PolynomialFeatures(degree=2, include_bias=True)
x_train_trans = poly.fit_transform(x_train)
x_test_trans = poly.transform(x_test)
#include bias parameter
lr = LinearRegression()
lr.fit(x_train_trans, y_train)
y_pred = lr.predict(x_test_trans)
print(r2_score(y_test, y_pred))

After converting to polynomial terms, we fit the linear regression which is now working as Polynomial Regression in Machine Learning. If you print the x_train value and train transformed value, you will see the 3 polynomial terms. And the model is now performing descent well and if you see the coefficients and intercept value. our coefficient was 0.9, and it predicted 0.88 and the intercept was 2 it has given 1.9 which is very close to the original and the model can be said as a generalized model.

print(lr.coef_)
print(lr.intercept_)

If we visualize the predicted line across the training data points, we can see how well it identifies the non-linear relationship in data.

X_new = np.linspace(-3, 3, 200).reshape(200, 1)
X_new_poly = poly.transform(X_new)
y_new = lr.predict(X_new_poly)
plt.plot(X_new, y_new, "r-", linewidth=2, label="Predictions")
plt.plot(x_train, y_train, "b.",label='Training points')
plt.plot(x_test, y_test, "g.",label='Testing points')
plt.xlabel("X")
plt.ylabel("y")
plt.legend()
plt.show()

Playing With a Polynomial Degree

Now we will design a function that will help you to find the right value for a degree. here we apply all the preprocessing steps we have done above in a function and map the end prediction plot on it. All you need to do to pass is the degree and it will build a model and plot a graph of a particular degree. here we will create a pipeline of preprocessing steps that makes the process streamlined.

Source: Analytics Vidhya

from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
def polynomial_regression(degree):
    X_new=np.linspace(-3, 3, 100).reshape(100, 1)
    X_new_poly = poly.transform(X_new)
    polybig_features = PolynomialFeatures(degree=degree, include_bias=False)
    std_scaler = StandardScaler()
    lin_reg = LinearRegression()
    polynomial_regression = Pipeline([
            ("poly_features", polybig_features),
            ("std_scaler", std_scaler),
            ("lin_reg", lin_reg),
        ])
    polynomial_regression.fit(X, y)
    y_newbig = polynomial_regression.predict(X_new)
    #plotting prediction line
    plt.plot(X_new, y_newbig,'r', label="Degree " + str(degree), linewidth=2)
    plt.plot(x_train, y_train, "b.", linewidth=3)
    plt.plot(x_test, y_test, "g.", linewidth=3)
    plt.legend(loc="upper left")
    plt.xlabel("X")
    plt.ylabel("y")
    plt.axis([-3, 3, 0, 10])
    plt.show()

when we run the function while passing high degrees like 10, 15, and 20, then the model tries to overfit the data means slowly the prediction line will leave its original essence and try to rely on training data points, and as there is some change in the training path, the line tries to catch the point.

polynomial_regression(25)

This is a problem with a High degree of polynomial, which I want to show you practically, so it’s necessary to choose an optimum value of a degree. here I would like to recommend you try a different degree and analyze the results.

Polynomial Regression With Multiple columns

We have seen polynomial regression python with one variable. most of the time, there will be many columns in input data, so how to apply regression polynomial and visualize the result in 3-dimensional space. It sometimes feels like a hectic task for most beginners, so let’s crack that out and understand how to perform Polynomial Regression in Machine Learning in 3-d space.

Step 1: Creating a dataset

I am taking 2 input columns and one output column. the approach with multiple columns is the same.

# 3D polynomial regression
x = 7 * np.random.rand(100, 1) - 2.8
y = 7 * np.random.rand(100, 1) - 2.8
z = x**2 + y**2 + 0.2*x + 0.2*y + 0.1*x*y +2 + np.random.randn(100, 1)

let’s visualize the data in 3-d space using a 3-D scatter plot (Plotly library).

import plotly.express as px
df = px.data.iris()
fig = px.scatter_3d(df, x=x.ravel(), y=y.ravel(), z=z.ravel())
fig.show()

Step 2: Applying linear regression

first, let’s try to estimate results with simple linear regression for better understanding and comparison.

A numpy mesh grid is useful for converting 2 vectors to a coordinating grid, so we can extend this to 3-d instead of 2-d.
Numpy v-stack is used to stack the arrays vertically(row-wise). This is equivalent to concatenating along axis 1.

let’s visualize the prediction of linear regression in 3-d space.


import plotly.graph_objects as go
fig = px.scatter_3d(df, x=x.ravel(), y=y.ravel(), z=z.ravel())
fig.add_trace(go.Surface(x = x_input, y = y_input, z =z_final ))
fig.show()

Step 3: Estimating results using polynomial regression python

Now we will transform inputs to polynomial terms and see the powers

X_multi = np.array([x,y]).reshape(100,2)
poly = PolynomialFeatures(degree=30)
X_multi_trans = poly.fit_transform(X_multi)
print("Input",poly.n_input_features_)
print("Ouput",poly.n_output_features_)
print("Powersn",poly.powers_)

After running the above code, you will get the powers of both x and y, and we can estimate the result as x power 0 and y power 0, x power 1 and y power 0, and so on. let’s apply the regression to these polynomial terms.

lr = LinearRegression()
lr.fit(X_multi_trans, z)
X_test_multi = poly.transform(final)
z_final = lr.predict(X_multi_trans).reshape(10,10)

Now when we visualize the results of regression polynomial, we can see how well the contour has plotted.

The plot looks beautiful. We can see in some places, the plot is up and down, meaning somewhere it is overfitting the data. So it takes some time to find the generalized term, and you have to do the heat and trial method.

Conclusion

I hope you now understand the intuition and practical implementation behind the algorithm.

This tutorial taught us that polynomial regression is a form of linear regression, specifically a special case of multiple linear regression. It estimates the relationship as an nth-degree polynomial. Polynomial Regression in Machine Learning is sensitive to outliers, so the presence of one or two outliers can also badly affect the performance.

Key Takeaways

A polynomial regression python model is a machine learning model that can capture non-linear relationships between variables by fitting a non-linear regression line, which may not be possible with simple linear regression.
It is used when linear regression models may not adequately capture the complexity of the relationship.
It can be useful in various fields, such as finance, physics, engineering, and social sciences, where there may be nonlinear relationships between variables.

Frequently Asked Questions

Q1. What is the difference between linear and polynomial regression?

A. Linear regression models a relationship with a straight line, while polynomial regression uses a curve by including higher-degree terms.

Q2. When should I use polynomial regression?

A. Use polynomial regression when data shows a nonlinear relationship that a straight line cannot accurately model.

Q3. What is a real life example of polynomial regression?

A. A real-life example of polynomial regression is predicting the trajectory of a rocket, where the relationship between time and position is nonlinear.

Q4. What is the difference between logistic regression and polynomial regression?

A. Logistic regression predicts categorical outcomes, typically binary, while polynomial regression models continuous data with a polynomial equation.

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

blogathon Polynomial regression

Raghav Agrawal 14 Jun, 2024

I am a final year undergraduate who loves to learn and write about technology. I am a passionate learner, and a data science enthusiast. I am learning and working in data science field from past 2 years, and aspire to grow as Big data architect.

Intermediate Machine Learning Project Python Regression

Frequently Asked Questions

Responses From Readers

Haider Jihad 10 Mar, 2022

hello dear im working on design eye tracker , i use a camera to capture an eye , then i write a python code to find pupil position (x,y) from that photo , the challenge is the next step , finding the point of regard , by another word gaze estimation . gaze astimation is excuted by regression , any help with REGRESSION ?

Nayomi 23 Oct, 2022

I have five independent variables. When I was doing linear regression analysis, two of them become insignificant. So if I carried a quadratic regression analysis, then do I need to consider three significant variables again in the quadratic model?