Basic Introduction to Loss Functions

Pranshu Last Updated : 12 Oct, 2024

7 min read

This article was published as a part of the Data Science Blogathon.

Introduction

The loss function is as noteworthy to Machine Learning as the guide is important to a student. Just like a guide aids in improving the performance /efficiency of a student, similarly, loss functions are required to improve the output result of the model for better accuracy.

The loss function serves as the basis of modern machine learning. To put it simply, a loss function indicates how inaccurate the model is at determining the relationship between x and y. Loss functions serve as a gauge for how well your model can forecast the desired result.

Any statistical model utilizes loss functions, which provide a goal against which the model’s performance is evaluated. The parameters that the model learns are then calculated by minimizing the selected loss function. The error (or simply the “the loss”) between the output of our algorithms and the specified target value is calculated using loss functions.

The loss function assesses how your machine learning algorithm predicts the featured data set.

Source: deepnetts.com

Introduction
What is Loss Function
Several Regression Loss Functions
Some of the Loss Functions for Classification
Some of the Loss Functions for Multi-class Classification
- Multi-class Cross-Entropy
- Multi-class Sparse Cross-Entropy
Conclusion
Frequently Asked Questions

What is Loss Function

The loss function is the function that determines how far the algorithm’s current output is from what is desired. This is a technique for assessing how well our algorithm models the input. It can be divided into two categories. Both for regression and for classification

The loss function in machine learning distinguishes between the model’s projected output and the actual output for a single training example. In contrast, the cost function is the mean of the loss functions across all training examples.

Source:edcuba.com

Loss functions in neural networks aid in improving the model’s performance. They are typically employed to quantify a penalty the model imposes on its predictions, such as the prediction’s departure from the label representing the ground truth.

Loss functions and metrics varied slightly from one another as well. Loss functions can provide data on the effectiveness of our model, but they may not be directly relevant or simple to understand for humans. Metrics are useful in this situation. Even though they may not be the best options for loss functions since they may not be differentiable, metrics like accuracy are considerably more useful for people to comprehend how well a neural network performs.

Regarding the problems we encounter in the actual world, loss functions can be broadly divided into classification and regression. Our task in classification problems is to predict the respective probability of each class that the challenge involves. The goal of regression, on the other hand, is to forecast the continuous value for a given collection of independent features to the learning algorithm.

Several Regression Loss Functions

The regression includes making a particular, continuous value prediction. Regression examples include estimating home prices and forecasting stock prices because they both aim to create models that can forecast real-valued quantities.

Mean Absolute Error

The total absolute difference between the actual and projected variables is calculated using MAE. The average size of mistakes in a group of projected values is thus measured. While the absolute error is much more resistant to outliers, the mean square error is simpler to address. Outlier values are ones that significantly differ from other reported data points.

If the prediction and the ground truth were identical, the MAE would be zero, which it never is. Given that you wish to reduce the inaccuracy in your predictions, a regression problem might benefit from using this straightforward loss function as one of your measurements.

MAE averages out the absolute disparities between the actual and anticipated values. When a data point x_i and its anticipated value y_i are considered, where n is the total number of data points in the collection

The mean absolute error ( Mathematical formula) is defined as follows:

Source: Medium.com

Python Implementation

import numpy as np

def mean_absolute_error(act, pred):

diff = pred - act

abs_diff = np.absolute(diff)

mean_diff = abs_diff.mean()

return mean_diff

act = np.array([1.1,2,1.7])

pred = np.array([1,1.7,1.5])

mean_absolute_error(act,pred)

Mean Squared Error

The average squared difference between the actual and model-predicted values is measured by MSE(L2 error). A single number that corresponds to a range of values is the output. Our goal is to lower MSE to increase the model’s accuracy.

The mean squared error is the average of the squared discrepancies between the actual and anticipated values. Models trained with mean squared error have fewer outliers or at least less severe outliers than models trained with mean absolute error because mean squared error prioritizes a large number of little errors over a few large errors.

The mathematical formula is defined below:

Source: Medium.com

Python Implementation

import numpy as np

def mean_squared_error(act, pred):

   diff = pred - act
   differences_squared = diff ** 2
   mean_diff = differences_squared.mean()
   
   return mean_diff

act = np.array([1.1,2,1.7])
pred = np.array([1,1.7,1.5])

print(mean_squared_error(act,pred))

Mean Bias Error

The mean bias in the model is calculated using Mean Bias Error. In a word, bias is the over- or underestimation of a parameter.

Mean Bias Error uses the actual, not the absolute, difference between the target and the forecasted result.

Python Implementation

# Mean Bias Error
 
def mbe( y, y_pred ) :
return np.sum( y - y_pred ) / np.size( y )

Mean Squared Logarithmic Error Loss (MSLE)

The MSLE calculates the ratio of the actual value to the expected value. The error curve becomes asymmetric as a result. Only the percentage difference between the actual and anticipated values is important to MSLE. When we want to forecast house sales or bakery sales prices and the continuous data, it can be a viable option for a loss function.

Source: towardsdatascience.com

Some of the Loss Functions for Classification

Determining a discrete class output is a challenge in classification tasks. It entails categorizing the dataset into distinct classes depending on various factors so that a brand-new record can be added to one of the classes.

Binary Cross-Entropy

This loss function serves as the default one for binary classification issues. A classification model’s effectiveness is calculated using the cross-entropy loss, which outputs a probability value between 0 and 1. The cross-entropy loss grows as the anticipated probability value deviates from the actual label.

Python Implementation

# calculate binary cross entropy
def binary_cross_entropy(actual, predicted):
	sum_score = 0.0
	for i in range(len(actual)):
		sum_score += actual[i] * log(1e-15 + predicted[i]
	mean_sum_score = 1.0 / len(actual) * sum_score
	return -mean_sum_score

Hinge loss

Cross-entropy, initially created to be utilized with a support vector machine algorithm, can be replaced by hinge loss. The classification issue benefits from hinge loss the most because the target values fall into the range of -1,1. If there is a change in sign between the actual and anticipated numbers, it enables the assignment of greater error. Consequently, it performs better than cross-entropy.

Python Implementation

# Hinge Loss
def hinge(y, y_pred):
l = 0
size = np.size(y)
for i in range(size):
l = l + max(0, 1 - y[i] * y_pred[i])
return l / size

Kullback Leibler Divergence Loss (KL Loss)

A distribution’s Kullback Leibler Divergence Loss gauges how different it is from a standard distribution. When the Kullback Leibler Divergence Loss is 0, the probability distributions are the same for both cases.

Squared hinge loss

In addition to hinge loss, that only computes the hinge loss score’s square. It makes it easier to work numerically and reduces the error function. It identifies the categorization border that establishes the largest possible difference between data points of different classes.

Log Loss

Assesses how accurately a model gives probabilities for different outcomes, especially in tasks where you categorize things.

Some of the Loss Functions for Multi-class Classification

Multi-class classifications are Predictive
models where more than two classes are being allotted.

Multi-class Cross-Entropy

In this instance, the target values are 0 to n, or 0 to 1, 2, 3, and n. To achieve the highest level of accuracy, a score is calculated by averaging the differences between actual and anticipated probability values.

Multi-class Sparse Cross-Entropy

Multi-class cross-entropy has difficulty handling many data points due to one hot encoding operation. This issue is resolved by sparse cross-entropy, which calculates error without using one-hot encoding.

Conclusion

So in this article, we studied loss functions in an introductory manner. We saw how loss functions play an important role in modern-day machine learning problems. We also saw how the model’s performance depends on its loss function and how it helps optimize the output.

Some of the following points we covered in this article

We studied loss function and its significance in machine learning
We also studied the working of the loss function and its role in getting optimized output from the respective machine Learning model
We also studied various types of loss functions used in regression problems
We also studied various types of loss functions used in classification problems as well as in multi-class classification problems
We also covered python implementation of some of the common loss functions being employed in both regression and classification problems
We studied about mean squared loss and its python implementation as well as its mathematical formula
Basically, We covered common loss functions for regression as well classification(binary and multiclass) along with their explanation and python implementation

Frequently Asked Questions

Q1. What does loss function show?

A loss function shows how well a machine learning model performs by measuring the difference between predicted and actual values. The goal is to minimize this difference during training for more accurate predictions.

Q2. How do you write a loss function?

To create a loss function:
1. Explain how your model guesses.
2. Know the correct answers.
3. Make a math rule for the difference.
4. Add up all the differences.
5. Choose if you want to make the total difference smaller or bigger.
6. Write it in code using a programming language like Python.

Q3.How do you reduce loss function?

1. Adjust how the model learns during training.
2. Give better rules for the model to follow.
3. Teach the model with more examples.
4. Improve the information given to the model.
5. Add special rules to avoid overconfidence.
6. Test different learning speeds.
7. Find the right balance for model complexity.
8. Stop training if it’s not improving.

I hope you liked my article; Please share it in the comments below.

My name is Pranshu Sharma, and I am a Data Science Enthusiast. Thank you so much for taking your precious time to read this blog. Feel free to point out any mistake(I’m a learner, after all) and provide respective feedback or leave a comment.

Feedback:Email: [email protected]

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Pranshu

Beginner Deep Learning Machine Learning Maths Python

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to Deep Learning

Feed Forward Networks

Feed Forward Networks

Gradient Descent

Loss Function

Activation Functions

Introduction to Neural networks

Forward and Backward Propagation

Optimizers

Learning Rate Schedulers

NN on Structured Data

Improving the Deep Learning Model

Deep Learning Model Optimization

Unsupervised Deep Learning

AutoDL

Model Deployment

Introduction to PyTorch

Basic Introduction to Loss Functions

Introduction

Table of contents

What is Loss Function

Several Regression Loss Functions

Mean Absolute Error

Mean Squared Error

Mean Bias Error

Mean Squared Logarithmic Error Loss (MSLE)

Some of the Loss Functions for Classification

Binary Cross-Entropy

Hinge loss

Kullback Leibler Divergence Loss (KL Loss)

Squared hinge loss

Log Loss

Some of the Loss Functions for Multi-class Classification

Multi-class Cross-Entropy

Multi-class Sparse Cross-Entropy

Conclusion

Frequently Asked Questions

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Congratulations, You Did It!

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics