Classification in Machine Learning

Arnab M Last Updated : 07 Apr, 2025

8 min read

Machine learning plays a key role in education and beyond by using algorithms that learn from data. These algorithms solve real-world problems by recognizing patterns and making decisions. One important task in this field is classification, where data points are sorted into categories. For example, an email system might classify messages as “spam” or “not spam.” Different types of classification challenges exist, each requiring specific methods. This article explains how classification works in machine learning and the techniques used to tackle these tasks.

This article was published as a part of the Data Science Blogathon

Understanding Classification in Machine Learning
- Types of Classification Model:
Binary Classification for Machine Learning
- Popular Algorithms for Binary Classification
Multi-Class Classification
- Example Implementation
Multi-Label Classification for Machine Learning
Imbalanced Classification for Machine Learning
Conclusion

Understanding Classification in Machine Learning

Classification usually refers to any kind of problem where a specific type of class label is the result to be predicted from the given input field of data. Some types of Classification challenges are :

Classifying emails as spam or not
Classify a given handwritten character to be either a known character or not
Classify recent user behaviour as churn or not

For any model, you will require a training dataset with many examples of inputs and outputs from which the model will train itself. The training data must include all possible scenarios of the problem and provide sufficient input data for each label to train the model correctly. Class labels often return as string values, so you need to encode them into integers, such as representing 0 for “spam” and 1 for “no-spam.”

There is no general theory for the best model but one is expected to experiment and discover which algorithm and configuration will result in the best performance for a specific task. In classification predictive modelling, the various algorithms are compared with their results. Classification accuracy is an interesting metric to evaluate the performance of any model based on the various predicted class labels. Classification accuracy might not be the best parameter but is a good point to begin for most of the classification tasks.

In place of a class label, some might give us the prediction of a probability of class membership of a particular input and in such cases, the ROC curve can be a helpful indicator of how accurate one model is.

Read this 5 Classification Algorithms you should know

Types of Classification Model:

You might encounter four main types of classification tasks in your day-to-day challenges. Generally, these types of predictive models in machine learning include:

Binary classification
Multi-Label Classification
Multi-Class Classification
Imbalanced Classification

We will go over them one by one.

Binary Classification for Machine Learning

A binary classification refers to those tasks which can give either of any two class labels as the output. Generally, one is considered as the normal state and the other is considered to be the abnormal state. The following examples will help you to understand them better.

Email Spam detection: Normal State – Not Spam, Abnormal State – Spam
Conversion prediction: Normal State – Not churned, Abnormal State – Churn
Conversion Prediction: Normal State – Bought an item, Abnormal State – Not bought an item

You can also add the example of that ” No cancer detected” to be a normal state and ” Cancer detected” to be the abnormal state. The notation typically assigns the value of 0 to the normal state and the value of 1 to the class with the abnormal state. For each example, one can also create a model which predicts the Bernoulli probability for the output. You can read more about the probability here for Bernouli Distribtuion. In short, it returns a discrete value that covers all cases and will give the output as either the outcome will have a value of 1 or 0. Hence after the association to two different states, the model can give an output for either of the values present.

Popular Algorithms for Binary Classification

K-Nearest Neighbours
Logistic Regression
Support Vector Machine
Decision Trees
Naive Bayes

Out of the mentioned algorithms, some specifically target binary classification and do not natively support more than two classes. Some examples of such algorithms are Support Vector Machines and Logistic Regression. Now we will create a dataset of our own and use binary classification on it. We will use the make blob() function of the scikit-learn module to generate a binary classification dataset. The example below uses a dataset with 1000 examples that belong to either of the two classes present with two input features.

Code :

from numpy import where
from collections import Counter
from sklearn.datasets import make_blobs
from matplotlib import pyplot
X, y = make_blobs(n_samples=5000, centers=2, random_state=1)
print(X.shape, y.shape)
counter = Counter(y)
print(counter)
for i in range(10):
print(X[i], y[i])
for label, _ in counter.items():
row_ix = where(y == label)[0]
pyplot.scatter(X[row_ix, 0], X[row_ix, 1], label=str(label))
pyplot.legend()
pyplot.show()

The above example creates a dataset of 5000 samples and divides them into input ‘X’ and output ‘Y’ elements. The distribution shows us that anyone instance can either belong to either class 0 or class 1 and there are approximately 50% in each.

The first 10 examples in the dataset show numeric input values, and the target value is an integer that represents class membership.

Then a scatter plot is created for the input variables where the resultant points are colour coded based on the class value. We can easily see two distinct clusters which we can discriminate.

Classification Algorithms in Machine learning

Here are five common classification algorithms in machine learning:

Logistic Regression: Used for binary classification problems, logistic regression models the probability of a certain class using a logistic function.

Decision Trees: These hierarchical structures split the dataset into subsets based on the most significant features, creating a tree-like structure for classification.

Random Forest: A type of ensemble learning, random forests construct multiple decision trees during training and output the mode of the classes for classification tasks.

Support Vector Machines (SVM): SVM is a powerful algorithm used for classification tasks. It finds the hyperplane that best separates the classes in the feature space.

K-Nearest Neighbors (KNN): KNN classifies a data point based on the majority class of its k nearest neighbors in the feature space, making it a simple and intuitive classification algorithm.

These algorithms find wide use in various applications and form the foundation of many machine learning models. Researchers often assess them using evaluation metrics to gauge their performance on different datasets, while practitioners optimize their parameters based on the available data points.

Multi-Class Classification

These types of classification problems have no fixed two labels but can have any number of labels. Some popular examples of multi-class classification are :

Plant Species Classification
Face Classification
Optical Character recognition

Here there is no notion of a normal and abnormal outcome but the result will belong to one of many among a range of variables of known classes. There can also be a huge number of labels like predicting a picture as to how closely it might belong to one out of the tens of thousands of the faces of the recognition system.

Another type of challenge involves predicting the next word in a sequence, such as in a translation model for text, which can also be considered multi-class classification. In this particular scenario, all the words of the vocabulary define all the possible number of classes and that can range in millions.

These types of models are generally done using a Categorical Distribution unlike Bernoulli for binary classification. In a Categorical Distribution, an event can have multiple endpoints or results and hence the model predicts the probability of input with respect to each of the output labels.

The most common algorithms which are used for Multi-Class Classification are :

K-Nearest Neighbours
Naive Bayes
Decision trees
Gradient Boosting
Random Forest

You can also use the algorithms for binary classification based on either one class versus all other classes, known as one-vs-rest, or one model for a pair of classes, known as one-vs-one.

One Vs Rest – The main task here is to fit one model for each class which will be versus all the other classes

One Vs One – The main task here is to define a binary model for every pair of classes.

Example Implementation

We will again take the example of multi-class classification by using the make_blobs() function of the scikit learn module. The following code demonstrates it.

Code :

from numpy import where
from collections import Counter
from sklearn.datasets import make_blobs
from matplotlib import pyplot
X, y = make_blobs(n_samples=1000, centers=4, random_state=1)
print(X.shape, y.shape)
counter = Counter(y)
print(counter)
for i in range(10):
  print(X[i], y[i])
for label, _ in counter.items():
  row_ix = where(y == label)[0]
  pyplot.scatter(X[row_ix, 0], X[row_ix, 1], label=str(label))
pyplot.legend()
pyplot.show()

Output :

(1000, 2) (1000,)
Counter({1: 250, 2: 250, 0: 250, 3: 250})
[-10.45765533  -3.30899488] 1
[-5.90962043 -7.80717036] 2
[-1.00497975  4.35530142] 0
[-6.63784922 -4.52085249] 3
[-6.3466658  -8.89940182] 2
[-4.67047183 -3.35527602] 3
[-5.62742066 -1.70195987] 3
[-6.91064247 -2.83731201] 3
[-1.76490462  5.03668554] 0
[-8.70416288 -4.39234621] 1

multiclass classification in machine learning

Here we can see that there are more than two class types and we can classify them separately into the different types.

Multi-Label Classification for Machine Learning

In multi-label Classification, we refer to those specific classification tasks where we need to assign two or more specific class labels that could be predicted for each example. A basic example can be photo classification where a single photo can have multiple objects in it like a dog or an apple and etcetera. The main difference is the ability to predict multiple labels and not just one.

You cannot use a binary classification model or a multi-class classification model for multi-label classification and you have to use a modified version of the algorithm to incorporate for multiple classes which can be possible and then to look for them all. It becomes more challenging than a simple yes or no statement. The common algorithms used here are :

Multi-label Random Forests
Multi-label Decision trees
Multi-label Gradient Boosting

One more approach is to use a separate classification algorithm for the label prediction for each and every type of class. We will use a library from scikit-learn to generate our multi-label classification dataset from scratch. The following code creates and shows the working example of multi-label classification of 1000 samples and 4 types of classes.

Code:

from sklearn.datasets import make_multilabel_classification
X, y = make_multilabel_classification(n_samples=1000, n_features=3, n_classes=4, n_labels=4, random_state=1)
print(X.shape, y.shape)
for i in range(10):
	print(X[i], y[i])

Output :

(1000, 3) (1000, 4)
[ 8. 11. 13.] [1 1 0 1]
[ 5. 15. 21.] [1 1 0 1]
[15. 30. 14.] [1 0 0 0]
[ 3. 15. 40.] [0 1 0 0]
[ 7. 22. 14.] [1 0 0 1]
[12. 28. 15.] [1 0 0 0]
[ 7. 30. 24.] [1 1 0 1]
[15. 30. 14.] [1 1 1 1]
[10. 23. 21.] [1 1 1 1]
[10. 19. 16.] [1 1 0 1]

Imbalanced Classification for Machine Learning

An Imbalanced Classification refers to those tasks where the number of examples in each of the classes are unequally distributed. Generally, imbalanced classification tasks are binary classification jobs where a major portion of the training dataset is of the normal class type and a minority of them belong to the abnormal class.

The most important examples of these use cases are :

Fraud Detection
Outlier Detection
Medical Diagnosis Test

The problems are transformed into binary classification tasks with some specialized techniques. You can either utilise undersampling for the majority classes or oversampling for the minority classes. The most prominent examples are :

Random Undersampling
SMOTE Oversampling

Special modeling algorithms can focus more on the minority class when fitting the model on the training dataset, including cost-sensitive machine learning models. Especially for cases like :

Cost-Sensitive Logistic Regression
Cost-Sensitive Decision Trees
Cost-Sensitive Support Vector Machines

So after choosing the model, we need to access the model and score it for which we can either use Precision, Recall or F-Measure score. Now we will take a look to develop a dataset for the imbalanced classification problem. We will use the Classification function of scikit-learn to generate a fully synthetic and imbalanced binary classification dataset of 1000 samples

Code :

from numpy import where
from collections import Counter
from sklearn.datasets import make_classification
from matplotlib import pyplot

X, y = make_classification(n_samples=1000, n_features=2, n_informative=2, n_redundant=0, n_classes=2, n_clusters_per_class=1, weights=[0.99,0.01], random_state=1)
print(X.shape, y.shape)
counter = Counter(y)
print(counter)
for i in range(10):
print(X[i], y[i])
for label, _ in counter.items():
row_ix = where(y == label)[0]
pyplot.scatter(X[row_ix, 0], X[row_ix, 1], label=str(label))
pyplot.legend()
pyplot.show()

Output :

(1000, 2) (1000,)
Counter({0: 983, 1: 17})
[0.86924745 1.18613612] 0
[1.55110839 1.81032905] 0
[1.29361936 1.01094607] 0
[1.11988947 1.63251786] 0
[1.04235568 1.12152929] 0
[1.18114858 0.92397607] 0
[1.1365562  1.17652556] 0
[0.46291729 0.72924998] 0
[0.18315826 1.07141766] 0
[0.32411648 0.53515376] 0

imbalanced classification in machine learning

Here we can see the distribution of the labels and we can see a severe imbalance of the classes where 983 elements belong to one type and only 17 belong to the other type. We can see a majority of type 0 or class 0 as expected. These types of datasets are more difficult to identify but they have a more general and practical use case.

Conclusion

Thank you for reading till the end of the article and if you find it helpful in any way, don’t forget to share it with your network. If you would want to read some of the other articles by me, you can click here and feel free to connect with me on LinkedIn or Github.

Arnab M

I love to code and create new software for any purpose. I also love to play MMO and RTG games. Other hobbies include Exploring new places and restaurants and making new friends. Feel free to ping me on LinkedIn for any new ideas or same and if you need any help with any code too.

Free Courses

4.6

Exploratory Data Analysis with Python & GenAI

Learn EDA with Python: Transform data into insights using PandasAI & more.

4.5

Data Science Course

Build a powerful 2026-ready data science resume using AI tools.

4.5

No Code Predictive Analytics with Orange

No-code AI course for business pros with real-world ML use cases.

4.7

Adaptive Email Agents with DSPy

Build adaptive email agents with DSPy using context and smart learning.

4.9

Introduction to AI & ML

AI & ML are transforming industries. Learn their impacts in this course.

Reading list

Classification in Machine Learning

Table of contents

Understanding Classification in Machine Learning

Types of Classification Model:

Binary Classification for Machine Learning

Popular Algorithms for Binary Classification

Classification Algorithms in Machine learning

Multi-Class Classification

Example Implementation

Multi-Label Classification for Machine Learning

Imbalanced Classification for Machine Learning

Conclusion

Login to continue reading and enjoy expert-curated content.

Free Courses

Exploratory Data Analysis with Python & GenAI

Data Science Course

No Code Predictive Analytics with Orange

Adaptive Email Agents with DSPy

Introduction to AI & ML

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

Classification in Machine Learning

Table of contents

Understanding Classification in Machine Learning

Types of Classification Model:

Binary Classification for Machine Learning

Popular Algorithms for Binary Classification

Classification Algorithms in Machine learning

Multi-Class Classification

Example Implementation

Multi-Label Classification for Machine Learning

Imbalanced Classification for Machine Learning

Conclusion

Login to continue reading and enjoy expert-curated content.

Free Courses

Exploratory Data Analysis with Python & GenAI

Data Science Course

No Code Predictive Analytics with Orange

Adaptive Email Agents with DSPy

Introduction to AI & ML

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques