Shivam Baldha — Updated On March 22nd, 2022
Datasets Intermediate Python Recommendation

This article was published as a part of the Data Science Blogathon.


As a part of writing a blog on the ML topic, I selected a problem statement is Collaborative Filtering. This is a part of the recommendation systems, we have two techniques, In this bog we major focus on Collaborative-based filtering, this blog is about a very simple introduction of collaborative filtering.

In this filtering, we use user and movie rating matrix and then using these ratings we find common movies that have the same interest-as other movies.

To more about a recommendation system and content-based filtering click here.

  • Collaborative-based Filtering.
  • Content-based Filtering.

Table of Contents

  • What are recommendation systems?
  • What Is Collaborative Filtering?
  • Type Of Filtering
  • User-User-Based Collaborative Filtering
  • How To Compute The Cosine Similarity
  • Item-Item-Based Collaborative Filtering
  • Conclusion

What are Recommendation Systems?

Recommendation systems predict the user preferences or ratings that users would give to items. The recommendation system is very highly used in movies, news, advisement, music, etc.

The best examples of recommendation systems are Youtube, IMDb, Amazon, Flipkart, etc.

What is Collaborative Filtering?

Collaborative filtering is used by most recommendation systems to find similar patterns or information of the users, this technique can filter out items that users like on the basis of the ratings or reactions by similar users.

An example of collaborative filtering can be to predict the rating of a particular user based on user ratings for other movies and others’ ratings for all movies. This concept is widely used in recommending
movies, news, applications, and so many other items.

Let’s take one example and understand more about what is Collaborative Filtering,

let’s assume I have user U1, who likes movies m1,m2,m4. user U2 who likes movies m1,m3,m4, and user U3 who likes movie m1.

So our job is to recommend which are the new movie to watch for the user U3 next.

So here we can see users U1, U2, U3 watch/likes movies m1, so three have the same taste. now in user U1, U2 has like/watch movies m4, so user U3 could like movie m3 so I recommend movie m4, this is the flow of logic.

The key idea of CF is Users who agreed in the past tend to also agree in the future.

Understanding Collaborative Filtering

Types of Filtering

There are two types of Collaborative Filtering available:

  • User-User-based similarity/Collaborative Filtering
  • Item-Item-based similarity/Collaborative Filtering

The most popular Collaborative Filtering is item-item-based Collaborative Filtering.

User-User-Based Collaborative Filtering

user-user collaborative filtering is one kind of recommendation method which looks for similar users based on the items users have already liked or positively interacted with. Let’s take a one eg to understand user-user collaborative filtering.

Let’s assume given matrix A which contains user id and item id and rating or movies.

User-User Based Collaborative Filtering

Source Wikipedia


Compute a User User similarity follow these steps, so find a similarity between two users we can use cosine similarity.

so cosine similarity means the similarity between two vectors of inner product space, It is measured by the cosine of the angle between two vectors.

Source Wikipedia


How to Compute the Cosine Similarity?

To compute a cosine similarity we take a one eg here.

doc_of_food = 'this food is good but not a highly recommended by the foodies'

doc_of_election = "prime minister ND modi says Putin had no political interference is the election outcome.

doc_of_putin = "Post elections Vladimir Putin became President of Russia President Vladimir Putin had served as the Prime Minister earlier in his political career"
doc = [doc_of_food, doc__of_election, doc__ofputin]

I am assuming you all know how to encode a text to vector, so here we use a TfidfVectorizer() or CountVectorizer() as encoding the sentences.

Here we use a CountVectorizer to encode our document/text.

# Scikit Learn
from sklearn.feature_extraction.text import CountVectorizer
import pandas as pd

# Create the Document Term Matrix
count_vectorizer = CountVectorizer(stop_words='english')
count_vectorizer = CountVectorizer()
sparse_matrix = count_vectorizer.fit_transform(doc)

# not necessary: Convert Sparse Matrix to Pandas Dataframe if you want to see the word frequencies.
doc_term_matrix = sparse_matrix.todense()
df = pd.DataFrame(doc_term_matrix, 
                  index= [doc_of_food, doc__of_election, doc__ofputin] )

# Compute Cosine Similarity
from sklearn.metrics.pairwise import cosine_similarity
print(cosine_similarity(df, df))


[[1.         0.59160798 0.34785054]
 [0.59160798 1.         0.37416574]
 [0.34785054 0.37416574 1.        ]]

so given a matrix I can compute a cosine similarity matrix as the similarity between two users.

Simij= similarity(useri , userj)


Collaborative Filtering
Source ineuron


So we find a similarity matrix here and our task is to recommend a new movie /item to a user.

Suppose you have to recommend a new top 5 similar movie or item for a user 10, so we have a similarity matrix a and we go in a similarity matrix in user 10 and find a top 5 similar values corresponding to the user 10.

let’s suppose the top 5 similar to user10 is user 9,5,8,1,2. now you go into our user-item matrix and take all items of the user9,5,8,1,2 where they give a rating value and not watch by the user 10 and combine them. then we pick all those items and we recommended them for a user 10.

But there can be a small problem with a user-user similarity base system, user interests change over time, and then similarity values also change, and this is impacted on the recommended system.

there is another approach which is an item item-based similarity recommendation system.

Item-Item Based Collaborative Filtering

This is also very simple and very similar in idea with USER-USER Similarity Let’s dive deep into it.

This item-item similarity is solve a problem that occurs in a user user-based similarity.

Here we find a similarity matrix of items/movies, here we find a similarity between the two movies. to find a similarity we use a cosine distance between the two movies.

Similarity Matrix | Collaborative Filtering
Source ineuron


Simij= similarity(itemi , itemj)

So how to recommend an item to the user?

Let’s suppose we have to recommend new items to user10, and we know a user10 already likes/watch item7,8,1. Now we go to the item-item similarity matrix, we take the most similar item to items7,8,1 based on the similarity values.

let’s suppose the most similar item for item7 is {item9, item4, item10}, the Most similar item to item8 is {item19, item 4, item10} and the Most similar item to item 1 is {item9, item14, item10}

Now we take a very common item from every set of items and the common items are {item9, item4, item10, item 19, item 14} and we recommend these all items to user10.

The most popular filtering is item item-based filtering because over time item is not changed like the user user-based similarity.


Here we learned very very basic introduction of collaborative filtering we learned two types of the collaborative filtering Item-Item based filtering and user-user based filtering, here most useful is item-item based filtering because as time changes user interest may be change but after some time item reviews did not change so we mostly use an item-item based filtering.

Then we learned how to compute a cosine similarity using a very nice example, in the example, we take three different texts and find a similarity for each text, we use a cosine similarity the find similar user interest based on the user’s reviews.

In the Collaborative Filtering ML article, we learned how to compute a similarity matrix, how to recommend new items.

Here we use an item-item-based similarity as a recommendation system and this is a basic type of recommendation system.


Connect with me

You can provide your valuable feedback to me on LinkedIn, Thanks for giving your time and reading my article Collaborative Filtering.

For any suggestions or article requests, you can email me click here.


The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion. 


About the Author

Our Top Authors

Download Analytics Vidhya App for the Latest blog/Article

2 thoughts on "Introduction to Collaborative Filtering"

Sunil says: February 27, 2022 at 11:43 am
Nice introduction blog 👌 Reply
NITISH KUMAR says: April 25, 2022 at 11:57 pm
Nice introduction Baldha Reply

Leave a Reply Your email address will not be published. Required fields are marked *