Elbow Method for Finding the Optimal Number of Clusters in K-Means
Introduction
Clustering is an unsupervised machine-learning technique. It is the process of division of the dataset into groups in which the members in the same group possess similarities in features. The commonly used clustering techniques are K-Means clustering, Hierarchical clustering, Density-based clustering, Model-based clustering, etc. It can even handle large datasets. We can implement the K-Means clustering machine learning algorithm in the elbow method using the scikit-learn library in Python.
Learning Objectives
- Understand the K-Means algorithm.
- Understand and Implement K-Means Clustering Elbow Method.
This article was published as a part of the Data Science Blogathon.
Table of Contents
- What Is the Elbow Method in K-Means Clustering?
- K Means Clustering Using the Elbow Method
- Implementation of the Elbow Method
- Full Code
- Conclusion
- Frequently Asked Questions
What Is the Elbow Method in K-Means Clustering?
It is the simplest and most commonly used iterative type of unsupervised learning algorithm. Unlike supervised learning, we don’t have labeled data in K-Means. Some other unsupervised learning algorithms are PCA (Principle Component analysis), K-Medoid, etc.
In K-Means, we randomly initialize the K number of cluster centroids in the data (the number of k found using the Elbow Method will be discussed later in this tutorial) and iterates these centroids until no change happens to the position of the centroid. Let’s go through the steps involved in K-means clustering for a better understanding.
- Select the number of clusters for the dataset (K)
- Select the K number of centroids randomly from the dataset.
- Now we will use Euclidean distance or Manhattan distance as the metric to calculate the distance of the points from the nearest centroid and assign the points to that nearest cluster centroid, thus creating K clusters.
- Now we find the new centroid of the clusters thus formed.
- Again reassign the whole data point based on this new centroid, then repeat step 4. We will continue this for a given number of iterations until the position of the centroid doesn’t change, i.e., there is no more convergence.
Finding the optimal number of clusters is an important part of this algorithm. A commonly used method for finding the optimum K value is Elbow Method.
K Means Clustering Using the Elbow Method
In the Elbow method, we are actually varying the number of clusters (K) from 1 – 10. For each value of K, we are calculating WCSS (Within-Cluster Sum of Square). WCSS is the sum of the squared distance between each point and the centroid in a cluster. When we plot the WCSS with the K value, the plot looks like an Elbow. As the number of clusters increases, the WCSS value will start to decrease. WCSS value is largest when K = 1. When we analyze the graph, we can see that the graph will rapidly change at a point and thus creating an elbow shape. From this point, the graph moves almost parallel to the X-axis. The K value corresponding to this point is the optimal value of K or an optimal number of clusters.
.png)
Now let’s implement K-Means clustering using Python.
Implementation of the Elbow Method
Sample Dataset
The dataset we are using here is the Mall Customers data (Download here). It’s unlabeled data that contains the details of customers in a mall (features like genre, age, annual income(k$), and spending score). Our aim is to cluster the customers based on the relevant features of annual income and spending score.
.png)
First of all, we have to import the essential libraries.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import sklearn
Now let’s import the given dataset and slice the important features.
dataset = pd.read_csv('Mall_Customers.csv')
X = dataset.iloc[:, [3, 4]].values
We have to find the optimal K value for clustering the data. Now we are using the Elbow Method to find the optimal K value.
from sklearn.cluster import KMeans
wcss = [] for i in range(1, 11):
kmeans = KMeans(n_clusters = i, init = 'k-means++', random_state = 42)
kmeans.fit(X)
wcss.append(kmeans.inertia_)
The “init” argument is the method for initializing the centroid. We calculated the WCSS value for each K value. Now we have to plot the WCSS with the K value.
Python Code:
The graph will be like this:

The point at which the elbow shape is created is 5; that is, our K value or an optimal number of clusters is 5. Now let’s train the model on the input data with a number of clusters 5.
kmeans = KMeans(n_clusters = 5, init = "k-means++", random_state = 42)
y_kmeans = kmeans.fit_predict(X)
y_kmeans will be:
array([3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0,
3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 1,
3, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 4, 2, 1, 2, 4, 2, 4, 2,
1, 2, 4, 2, 4, 2, 4, 2, 4, 2, 1, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2,
4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2,
4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2,
4, 2])
y_kmeans gives us different clusters corresponding to X. Now, let’s plot all the clusters using matplotlib.
plt.scatter(X[y_kmeans == 0, 0], X[y_kmeans == 0, 1], s = 60, c = 'red', label = 'Cluster1')
plt.scatter(X[y_kmeans == 1, 0], X[y_kmeans == 1, 1], s = 60, c = 'blue', label = 'Cluster2')
plt.scatter(X[y_kmeans == 2, 0], X[y_kmeans == 2, 1], s = 60, c = 'green', label = 'Cluster3)
plt.scatter(X[y_kmeans == 3, 0], X[y_kmeans == 3, 1], s = 60, c = 'violet', label = 'Cluster4')
plt.scatter(X[y_kmeans == 4, 0], X[y_kmeans == 4, 1], s = 60, c = 'yellow', label = 'Cluster5')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s = 100, c = 'black', label = 'Centroids')
plt.xlabel('Annual Income (k$)') plt.ylabel('Spending Score (1-100)') plt.legend()
plt.show()
Graph:

Now we will visualize the clusters using the scatter plot. As you can see, there are 5 clusters in total that are visualized in different colors, and the centroid of each cluster is visualized in black color.
Full Code
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd # Importing the dataset
X = dataset.iloc[:, [3, 4]].values
dataset = pd.read_csv('Mall_Customers.csv')
from sklearn.cluster import KMeans
# Using the elbow method to find the optimal number of clusters wcss = [] for i in range(1, 11):
wcss.append(kmeans.inertia_)
kmeans = KMeans(n_clusters = i, init = 'k-means++', random_state = 42) kmeans.fit(X) plt.plot(range(1, 11), wcss) plt.xlabel('Number of clusters')
y_kmeans = kmeans.fit_predict(X)
plt.ylabel('WCSS') plt.show() # Training the K-Means model on the dataset kmeans = KMeans(n_clusters = 5, init = 'k-means++', random_state = 42) y_kmeans = kmeans.fit_predict(X)
# Visualising the clusters
plt.scatter( X[y_kmeans == 1, 0], X[y_kmeans == 1, 1], s = 60, c = 'blue', label = 'Cluster2')
plt.scatter( X[y_kmeans == 0, 0], X[y_kmeans == 0, 1], s = 60, c = 'red', label = 'Cluster1') plt.scatter( X[y_kmeans == 2, 0], X[y_kmeans == 2, 1], s = 60, c = 'green', label = 'Cluster3')
plt.scatter( kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s = 100, c = 'black', label = 'Centroids')
plt.scatter( X[y_kmeans == 3, 0], X[y_kmeans == 3, 1], s = 60, c = 'violet', label = 'Cluster4') plt.scatter( X[y_kmeans == 4, 0], X[y_kmeans == 4, 1], s = 60, c = 'yellow', label = 'Cluster5') plt.xlabel('Annual Income (k$)') plt.ylabel('Spending Score (1-100)') plt.legend()
plt.show()
Conclusion
In this article, we covered the basic concepts of the K-Means Clustering algorithm in Machine Learning. We used the Elbow method to find the optimal K value for clustering the data in our sample data set. We then used the matplotlib Python library to visualize the clusters as a scatterplot graph. In the upcoming articles, we can learn more about different ML Algorithms.
Key Takeaways
- K-Means is a popular unsupervised machine-learning algorithm widely used by Data Scientists on unlabeled data.
- The k-Means Elbow method is used to find the optimal value of the K in the K-Means algorithm.
Frequently Asked Questions
A. K-Means is widely used by Data Scientists to solve several problems in different fields:
1. Anomaly Detection: It is used to identify the outliers or anomalies in the dataset like fraud detection etc.
2. Customer Segmentation: It is used to group customers into different clusters on the basis of their income, preferences, etc., which helps companies to decide their marketing strategy accordingly.
3. Image Segmentation: K-means can be used to segment an image into regions based on color or texture similarity
4. KMeans are also widely used for cluster analysis.
A. K Means Clustering algorithm is an unsupervised machine-learning technique. It is the process of division of the dataset into clusters in which the members in the same cluster possess similarities in features.
Example: We have a customer large dataset, then we would like to create clusters on the basis of different aspects like age, income, etc., and target each cluster with a different type of marketing strategy.
A. Some methods used to find the optimal value of K are:
1. Elbow Method: In this method, we plot the WCSS (Within-Cluster Sum of Square)against different values of the K, and we select the value of K at the elbow point in the graph, i.e., after which the value of WSCC remains constant (parallel to the x-axis).
2. Silhouette method: In this method, we calculate the silhouette coefficient of each data point. The silhouette coefficient measures how well the data point fits in the assigned cluster as compared to the other cluster. The average Silhouette coefficient for different K is calculated to find the optimal value of K with the highest coefficient value.
3. Gap statistic method: In this method, we compare the WCSS for different values of K with the expected sum of squares values randomly generated from a uniform distribution. The optimal value of K is the one with the largest gap between the observed and expected sum of squares.
The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.