*This article was published as a part of the Data Science Blogathon.*

**Overview**

K-means clustering is a very popular and powerful unsupervised machine learning technique where we cluster data points based on similarity or closeness between the data points how exactly We cluster them? which methods do we use in K Means to cluster? for all these questions we are going to get answers in this article, before we begin take a close look at the below clustering example, what do you think? it’s easily interpretable, right? We clustered data points into 3 clusters based on their similarity or closeness.

**Table Of Contents**

1.introduction to K Means

2.K Means ++ Algorithm

3.How To Choose K Value in K Means?

4.Practical Considerations in K Means

5.Cluster Tendency

**1. Introduction**

Let’s simply understand K-means clustering with daily life examples. we know these days everybody loves to watch web series or movies on amazon prime, Netflix. have you ever observed one thing whenever you open Netflix? that is grouping movies together based on their genre i.e crime, suspense..etc, hope you observed or already know this. so Netflix genre grouping is one easy example to understand clustering. let’s understand more about k means clustering algorithm.

**Definition: **It groups the data points based on their similarity or closeness to each other, in simple terms, the algorithm needs to find the data points whose values are similar to each other and therefore these points would then belong to the same cluster.

so how does the algorithm find out values between two points to cluster them, the algorithm finds values is by using the method of ‘Distance Measure’. here distance measure is ‘Euclidean Distance’

The observations which are closer or similar to each other would have low Euclidean distance and then clustered together.

one more formula that you need to know to understand K means is ‘Centroid’. The k-means algorithm uses the concept of centroid to create ‘k clusters.’

So now you are ready to understand steps in the k-Means Clustering algorithm.

__Steps in K-Means__:

step1:choose k value for ex: k=2

step2:initialize centroids randomly

step3:calculate Euclidean distance from centroids to each data point and form clusters that are close to centroids

step4: find the centroid of each cluster and update centroids

step:5 repeat step3

Each time clusters are made centroids are updated, the updated centroid is the center of all points which fall in the cluster. This process continues till the centroid no longer changes i.e solution converges.

You can play around with the K-means algorithm using the below link, try it.

https://stanford.edu/class/engr108/visualizations/kmeans/kmeans.html

So what next? how do you choose initial centroids randomly?

**2. K-Means ++ Algorithm:**

** **

**3.How To Choose K Value In K-Means:**

__1.Elbow method__

__steps:__

__step1: compute clustering algorithm for different values of k.__

__for example k=[1,2,3,4,5,6,7,8,9,10]__

__step2: for each k calculate the within-cluster sum of squares(WCSS).__

__step3: plot curve of WCSS according to the number of clusters.__

__step4: The location of bend in the plot is generally considered an indicator of the approximate number of clusters.__

** **

**4.Practical Considerations In K-Means:**

- A choosing number of Clusters in Advance(K).
- Standardization of Data(scaling).
- Categorical Data(can be solved with K-Mode).
- Impact of initial Centroids and Outliers.

**5. ****Cluster Tendency****:**

*The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.*