Comprehensive Guide to build a Recommendation Engine from scratch (in Python)

Pulkit Sharma 29 Nov, 2023 • 25 min read

Introduction

In today’s world, every customer is faced with multiple choices. For example, If I’m looking for a book to read without any specific idea of what I want, there’s a wide range of possibilities how my search might pan out. I might waste a lot of time browsing around on the internet and trawling through various sites hoping to strike gold. I might look for recommendations from other people.

But if there was a site or app which could recommend me books based on what I have read previously, that would be a massive help. Instead of wasting time on various sites, I could just log in and voila! 10 recommended books tailored to my taste.

Comprehensive guide to building recommendation engines

This is what recommendation engines do and their power is being harnessed by most businesses these days. From Amazon to Netflix, Google to Goodreads, recommendation engines are one of the most widely used applications of machine learning techniques.

In this article, we will cover various types of recommendation engine algorithms and fundamentals of creating them in Python. We will also see the mathematics behind the workings of these algorithms. Finally, we will create our own recommendation engine using matrix factorization.

Project to build your Recommendation Engine

Problem Statement

Many online businesses rely on customer reviews and ratings. Explicit feedback is especially important in the entertainment and ecommerce industry where all customer engagements are impacted by these ratings. Netflix relies on such rating data to power its recommendation engine to provide the best movie and TV series recommendations that are personalized and most relevant to the user.

This practice problem challenges the participants to predict the ratings for jokes given by the users provided the ratings provided by the same users for another set of jokes. This dataset is taken from the famous jester online Joke Recommender system dataset.

Practice Now

Introduction
1. What are recommendation engines?
2. How does a recommendation engine work?
3. Case study in Python using the MovieLens Dataset
4. Building collaborative filtering model from scratch
5. Building a simple popularity and collaborative filtering model using Turicreate
6. Introduction to matrix factorization
7. Building a recommendation engine using matrix factorization
8. Evaluation metrics for recommendation engines
9. What else can be tried?
Conclusion
FAQs

1. What are recommendation engines?

Till recently, people generally tended to buy products recommended to them by their friends or the people they trust. This used to be the primary method of purchase when there was any doubt about the product. But with the advent of the digital age, that circle has expanded to include online sites that utilize some sort of recommendation engine.

A recommendation engine filters the data using different algorithms and recommends the most relevant items to users. It first captures the past behavior of a customer and based on that, recommends products which the users might be likely to buy.

If a completely new user visits an e-commerce site, that site will not have any past history of that user. So how does the site go about recommending products to the user in such a scenario? One possible solution could be to recommend the best selling products, i.e. the products which are high in demand. Another possible solution could be to recommend the products which would bring the maximum profit to the business.

If we can recommend a few items to a customer based on their needs and interests, it will create a positive impact on the user experience and lead to frequent visits. Hence, businesses nowadays are building smart and intelligent recommendation engines by studying the past behavior of their users.

Now that we have an intuition of recommendation engines, let’s now look at how they work.

2. How does a recommendation engine work?

Before we deep dive into this topic, first we’ll think of how we can recommend items to users:

We can recommend items to a user which are most popular among all the users
We can divide the users into multiple segments based on their preferences (user features) and recommend items to them based on the segment they belong to

Both of the above methods have their drawbacks. In the first case, the most popular items would be the same for each user so everybody will see the same recommendations. While in the second case, as the number of users increases, the number of features will also increase. So classifying the users into various segments will be a very difficult task.

The main problem here is that we are unable to tailor recommendations based on the specific interest of the users. It’s like Amazon is recommending you buy a laptop just because it’s been bought by the majority of the shoppers. But thankfully, Amazon (or any other big firm) does not recommend products using the above mentioned approach. They use some personalized methods which help them in recommending products more accurately.

Let’s now focus on how a recommendation engine works by going through the following steps.

2.1 Data collection

This is the first and most crucial step for building a recommendation engine. The data can be collected by two means: explicitly and implicitly. Explicit data is information that is provided intentionally, i.e. input from the users such as movie ratings. Implicit data is information that is not provided intentionally but gathered from available data streams like search history, clicks, order history, etc.

In the above image, Netflix is collecting the data explicitly in the form of ratings given by user to different movies.

Here the order history of a user is recorded by Amazon which is an example of implicit mode of data collection.

2.2 Data storage

The amount of data dictates how good the recommendations of the model can get. For example, in a movie recommendation system, the more ratings users give to movies, the better the recommendations get for other users. The type of data plays an important role in deciding the type of storage that has to be used. This type of storage could include a standard SQL database, a NoSQL database or some kind of object storage.

2.3 Filtering the data

After collecting and storing the data, we have to filter it so as to extract the relevant information required to make the final recommendations.

There are various algorithms that help us make the filtering process easier. In the next section, we will go through each algorithm in detail.

2.3.1 Content based filtering

This algorithm recommends products which are similar to the ones that a user has liked in the past.

For example, if a person has liked the movie “Inception”, then this algorithm will recommend movies that fall under the same genre. But how does the algorithm understand which genre to pick and recommend movies from?

Consider the example of Netflix. They save all the information related to each user in a vector form. This vector contains the past behavior of the user, i.e. the movies liked/disliked by the user and the ratings given by them. This vector is known as the profile vector. All the information related to movies is stored in another vector called the item vector. Item vector contains the details of each movie, like genre, cast, director, etc.

The content-based filtering algorithm finds the cosine of the angle between the profile vector and item vector, i.e. cosine similarity. Suppose A is the profile vector and B is the item vector, then the similarity between them can be calculated as:

Based on the cosine value, which ranges between -1 to 1, the movies are arranged in descending order and one of the two below approaches is used for recommendations:

Top-n approach: where the top n movies are recommended (Here n can be decided by the business)
Rating scale approach: Where a threshold is set and all the movies above that threshold are recommended

Other methods that can be used to calculate the similarity are:

Euclidean Distance: Similar items will lie in close proximity to each other if plotted in n-dimensional space. So, we can calculate the distance between items and based on that distance, recommend items to the user. The formula for the euclidean distance is given by:

Pearson’s Correlation: It tells us how much two items are correlated. Higher the correlation, more will be the similarity. Pearson’s correlation can be calculated using the following formula:

A major drawback of this algorithm is that it is limited to recommending items that are of the same type. It will never recommend products which the user has not bought or liked in the past. So if a user has watched or liked only action movies in the past, the system will recommend only action movies. It’s a very narrow way of building an engine.

To improve on this type of system, we need an algorithm that can recommend items not just based on the content, but the behavior of users as well.

2.3.2 Collaborative filtering

Let us understand this with an example. If person A likes 3 movies, say Interstellar, Inception and Predestination, and person B likes Inception, Predestination and The Prestige, then they have almost similar interests. We can say with some certainty that A should like The Prestige and B should like Interstellar. The collaborative filtering algorithm uses “User Behavior” for recommending items. This is one of the most commonly used algorithms in the industry as it is not dependent on any additional information. There are different types of collaborating filtering techniques and we shall look at them in detail below.

User-User collaborative filtering

This algorithm first finds the similarity score between users. Based on this similarity score, it then picks out the most similar users and recommends products which these similar users have liked or bought previously.

In terms of our movies example from earlier, this algorithm finds the similarity between each user based on the ratings they have previously given to different movies. The prediction of an item for a user u is calculated by computing the weighted sum of the user ratings given by other users to an item i.

The prediction Pu,i is given by:

Here,

Pu,i is the prediction of an item
Rv,i is the rating given by a user v to a movie i
Su,v is the similarity between users

Now, we have the ratings for users in profile vector and based on that we have to predict the ratings for other users. Following steps are followed to do so:

For predictions we need the similarity between the user u and v. We can make use of Pearson correlation.
First we find the items rated by both the users and based on the ratings, correlation between the users is calculated.
The predictions can be calculated using the similarity values. This algorithm, first of all calculates the similarity between each user and then based on each similarity calculates the predictions. Users having higher correlation will tend to be similar.
Based on these prediction values, recommendations are made. Let us understand it with an example:

Consider the user-movie rating matrix:

User/Movie	x1	x2	x3	x4	x5	Mean User Rating
A	4	1	–	4	–	3
B	–	4	–	2	3	3
C	–	1	–	4	4	3

Here we have a user movie rating matrix. To understand this in a more practical manner, let’s find the similarity between users (A, C) and (B, C) in the above table. Common movies rated by A/[ and C are movies x2 and x4 and by B and C are movies x2, x4 and x5.

The correlation between user A and C is more than the correlation between B and C. Hence users A and C have more similarity and the movies liked by user A will be recommended to user C and vice versa.

This algorithm is quite time consuming as it involves calculating the similarity for each user and then calculating prediction for each similarity score. One way of handling this problem is to select only a few users (neighbors) instead of all to make predictions, i.e. instead of making predictions for all similarity values, we choose only few similarity values. There are various ways to select the neighbors:

Select a threshold similarity and choose all the users above that value
Randomly select the users
Arrange the neighbors in descending order of their similarity value and choose top-N users
Use clustering for choosing neighbors

This algorithm is useful when the number of users is less. Its not effective when there are a large number of users as it will take a lot of time to compute the similarity between all user pairs. This leads us to item-item collaborative filtering, which is effective when the number of users is more than the items being recommended.

Item-Item collaborative filtering

In this algorithm, we compute the similarity between each pair of items.

So in our case we will find the similarity between each movie pair and based on that, we will recommend similar movies which are liked by the users in the past. This algorithm works similar to user-user collaborative filtering with just a little change – instead of taking the weighted sum of ratings of “user-neighbors”, we take the weighted sum of ratings of “item-neighbors”. The prediction is given by:

Now we will find the similarity between items.

Now, as we have the similarity between each movie and the ratings, predictions are made and based on those predictions, similar movies are recommended. Let us understand it with an example.

User/Movie	x1	x2	x3	x4	x5
A	4	1	2	4	4
B	2	4	4	2	1
C	–	1	–	3	4
Mean Item Rating	3	2	3	3	3

Here the mean item rating is the average of all the ratings given to a particular item (compare it with the table we saw in user-user filtering). Instead of finding the user-user similarity as we saw earlier, we find the item-item similarity.

To do this, first we need to find such users who have rated those items and based on the ratings, similarity between the items is calculated. Let us find the similarity between movies (x1, x4) and (x1, x5). Common users who have rated movies x1 and x4 are A and B while the users who have rated movies x1 and x5 are also A and B.

The similarity between movie x1 and x4 is more than the similarity between movie x1 and x5. So based on these similarity values, if any user searches for movie x1, they will be recommended movie x4 and vice versa. Before going further and implementing these concepts, there is a question which we must know the answer to – what will happen if a new user or a new item is added in the dataset? It is called a Cold Start. There can be two types of cold start:

Visitor Cold Start
Product Cold Start

Visitor Cold Start means that a new user is introduced in the dataset. Since there is no history of that user, the system does not know the preferences of that user. It becomes harder to recommend products to that user. So, how can we solve this problem? One basic approach could be to apply a popularity based strategy, i.e. recommend the most popular products. These can be determined by what has been popular recently overall or regionally. Once we know the preferences of the user, recommending products will be easier.

On the other hand, Product Cold Start means that a new product is launched in the market or added to the system. User action is most important to determine the value of any product. More the interaction a product receives, the easier it is for our model to recommend that product to the right user. We can make use of Content based filtering to solve this problem. The system first uses the content of the new product for recommendations and then eventually the user actions on that product.

Now let’s solidify our understanding of these concepts using a case study in Python. Get your machines ready because this is going to be fun!

3. Case study in Python using the MovieLens Dataset

We will work on the MovieLens dataset and build a model to recommend movies to the end users. This data has been collected by the GroupLens Research Project at the University of Minnesota. The dataset can be downloaded from here. This dataset consists of:

100,000 ratings (1-5) from 943 users on 1682 movies
Demographic information of the users (age, gender, occupation, etc.)

First, we’ll import our standard libraries and read the dataset in Python. Here is a live coding window to get you started. You can run the codes and get the output in this window itself:

This dataset contains attributes of 1682 movies. There are 24 columns out of which last 19 columns specify the genre of a particular movie. These are binary columns, i.e., a value of 1 denotes that the movie belongs to that genre, and 0 otherwise.

The dataset has already been divided into train and test by GroupLens where the test data has 10 ratings for each user, i.e. 9,430 rows in total. We will read both these files into our Python environment.

r_cols = ['user_id', 'movie_id', 'rating', 'unix_timestamp']
ratings_train = pd.read_csv('ml-100k/ua.base', sep='\t', names=r_cols, encoding='latin-1')
ratings_test = pd.read_csv('ml-100k/ua.test', sep='\t', names=r_cols, encoding='latin-1')
ratings_train.shape, ratings_test.shape

It’s finally time to build our recommend engine!

4. Building collaborative filtering model from scratch

We will recommend movies based on user-user similarity and item-item similarity. For that, first we need to calculate the number of unique users and movies.

n_users = ratings.user_id.unique().shape[0]
n_items = ratings.movie_id.unique().shape[0]

Now, we will create a user-item matrix which can be used to calculate the similarity between users and items.

data_matrix = np.zeros((n_users, n_items))
for line in ratings.itertuples():
    data_matrix[line[1]-1, line[2]-1] = line[3]

Now, we will calculate the similarity. We can use the pairwise_distance function from sklearn to calculate the cosine similarity.

from sklearn.metrics.pairwise import pairwise_distances 
user_similarity = pairwise_distances(data_matrix, metric='cosine')
item_similarity = pairwise_distances(data_matrix.T, metric='cosine')

This gives us the item-item and user-user similarity in an array form. The next step is to make predictions based on these similarities. Let’s define a function to do just that.

def predict(ratings, similarity, type='user'):
    if type == 'user':
        mean_user_rating = ratings.mean(axis=1)
        #We use np.newaxis so that mean_user_rating has same format as ratings
        ratings_diff = (ratings - mean_user_rating[:, np.newaxis])
        pred = mean_user_rating[:, np.newaxis] + similarity.dot(ratings_diff) / np.array([np.abs(similarity).sum(axis=1)]).T
    elif type == 'item':
        pred = ratings.dot(similarity) / np.array([np.abs(similarity).sum(axis=1)])
    return pred

Finally, we will make predictions based on user similarity and item similarity.

user_prediction = predict(data_matrix, user_similarity, type='user')
item_prediction = predict(data_matrix, item_similarity, type='item')

As it turns out, we also have a library which generates all these recommendations automatically. Let us now learn how to create a recommendation engine using turicreate in Python. To get familiar with turicreate and to install it on your machine, refer here.

5. Building a simple popularity and collaborative filtering model using Turicreate

After installing turicreate, first let’s import it and read the train and test dataset in our environment. Since we will be using turicreate, we will need to convert the dataset in SFrames.

import turicreate
train_data = turicreate.SFrame(ratings_train)
test_data = turicreate.Sframe(ratings_test)

We have user behavior as well as attributes of the users and movies, so we can make content based as well as collaborative filtering algorithms. We will start with a simple popularity model and then build a collaborative filtering model.

First we’ll build a model which will recommend movies based on the most popular choices, i.e., a model where all the users receive the same recommendation(s). We will use the turicreate recommender function popularity_recommender for this.

popularity_model = turicreate.popularity_recommender.create(train_data, user_id='user_id', item_id='movie_id', target='rating')

Various arguments which we have used are:

train_data: the SFrame which contains the required training data
user_id: the column name which represents each user ID
item_id: the column name which represents each item to be recommended (movie_id)
target: the column name representing scores/ratings given by the user

It’s prediction time! We will recommend the top 5 items for the first 5 users in our dataset.

popularity_recomm = popularity_model.recommend(users=[1,2,3,4,5],k=5)
popularity_recomm.print_rows(num_rows=25)

Note that the recommendations for all users are the same – 1467, 1201, 1189, 1122, 814. And they’re all in the same order! This confirms that all the recommended movies have an average rating of 5, i.e. all the users who watched the movie gave it a top rating. Thus our popularity system works as expected.

After building a popularity model, we will now build a collaborative filtering model. Let’s train the item similarity model and make top 5 recommendations for the first 5 users.

#Training the model
item_sim_model = turicreate.item_similarity_recommender.create(train_data, user_id='user_id', item_id='movie_id', target='rating', similarity_type='cosine')

#Making recommendations
item_sim_recomm = item_sim_model.recommend(users=[1,2,3,4,5],k=5)
item_sim_recomm.print_rows(num_rows=25)

Here we can see that the recommendations (movie_id) are different for each user. So personalization exists, i.e. for different users we have a different set of recommendations.

In this model, we do not have the ratings for each movie given by each user. We must find a way to predict all these missing ratings. For that, we have to find a set of features which can define how a user rates the movies. These are called latent features. We need to find a way to extract the most important latent features from the the existing features. Matrix factorization, covered in the next section, is one such technique which uses the lower dimension dense matrix and helps in extracting the important latent features.

6. Introduction to matrix factorization

Let’s understand matrix factorization with an example. Consider a user-movie ratings matrix (1-5) given by different users to different movies.

Here user_id is the unique ID of different users and each movie is also assigned a unique ID. A rating of 0.0 represents that the user has not rated that particular movie (1 is the lowest rating a user can give). We want to predict these missing ratings. Using matrix factorization, we can find some latent features that can determine how a user rates a movie. We decompose the matrix into constituent parts in such a way that the product of these parts generates the original matrix.

Let us assume that we have to find k latent features. So we can divide our rating matrix R(MxN) into P(MxK) and Q(NxK) such that P x QT (here QT is the transpose of Q matrix) approximates the R matrix:

, where:

M is the total number of users
N is the total number of movies
K is the total latent features
R is MxN user-movie rating matrix
P is MxK user-feature affinity matrix which represents the association between users and features
Q is NxK item-feature relevance matrix which represents the association between movies and features
Σ is KxK diagonal feature weight matrix which represents the essential weights of features

Choosing the latent features through matrix factorization removes the noise from the data. How? Well, it removes the feature(s) which does not determine how a user rates a movie. Now to get the rating rui for a movie qik rated by a user puk across all the latent features k, we can calculate the dot product of the 2 vectors and add them to get the ratings based on all the latent features.

This is how matrix factorization gives us the ratings for the movies which have not been rated by the users. But how can we add new data to our user-movie rating matrix, i.e. if a new user joins and rates a movie, how will we add this data to our pre-existing matrix?

Let me make it easier for you through the matrix factorization method. If a new user joins the system, there will be no change in the diagonal feature weight matrix Σ, as well as the item-feature relevance matrix Q. The only change will occur in the user-feature affinity matrix P. We can apply some matrix multiplication methods to do that.

We have,

Let’s multiply with Q on both sides.

Now, we have

So,

Simplifying it further, we can get the P matrix:

This is the updated user-feature affinity matrix. Similarly, if a new movie is added to the system, we can follow similar steps to get the updated item-feature relevance matrix Q.

Remember, we decomposed R matrix into P and Q. But how do we decide which P and Q matrix will approximate the R matrix? We can use the gradient descent algorithm for doing this. The objective here is to minimize the squared error between the actual rating and the one estimated using P and Q. The squared error is given by:

Here,

eui is the error
rui is the actual rating given by user u to the movie i
řui is the predicted rating by user u for the movie i

Our aim was to decide the p and q value in such a way that this error is minimized. We need to update the p and q values so as to get the optimized values of these matrices which will give the least error. Now we will define an update rule for puk and qki. The update rule in gradient descent is defined by the gradient of the error to be minimized.

As we now have the gradients, we can apply the update rule for puk and qki.

Here α is the learning rate which decides the size of each update. The above updates can be repeated until the error is minimized. Once that’s done, we get the optimal P and Q matrix which can be used to predict the ratings. Let us quickly recap how this algorithm works and then we will build the recommendation engine to predict the ratings for the unrated movies.

Below is how matrix factorization works for predicting ratings:

# for f = 1,2,....,k :
    # for rui ε R :
        # predict rui
        # update puk and qki

So based on each latent feature, all the missing ratings in the R matrix will be filled using the predicted rui value. Then puk and qki are updated using gradient descent and their optimal value is obtained. It can be visualized as shown below:

Now that we have understood the inner workings of this algorithm, we’ll take an example and see how a matrix is factorized into its constituents.

Consider a 2 X 3 matrix, A2X3 as shown below:

Here we have 2 users and their corresponding ratings for 3 movies. Now, we will decompose this matrix into sub parts, such that:

The eigenvalues of AAT will give us the P matrix and the eigenvalues of ATA will give us the Q matrix. Σ is the square root of the eigenvalues from AAT or ATA.

Calculate the eigenvalues for AAT.

So, the eigenvalues of AAT are 25, 9. Similarly, we can calculate the eigenvalues of ATA. These values will be 25, 9, 0. Now we have to calculate the corresponding eigenvectors for AAT and ATA.

For λ = 25, we have:

It can be row reduced to:

A unit-length vector in the kernel of that matrix is:

Similarly, for λ = 9 we have:

It can be row reduced to:

A unit-length vector in the kernel of that matrix is:

For the last eigenvector, we could find a unit vector perpendicular to q1 and q2. So,

Σ2X3 matrix is the square root of eigenvalues of AAT or ATA, i.e. 25 and 9.

Finally, we can compute P2X2 by the formula σpi = Aqi, or pi = 1/σ(Aqi). This gives:

So, the decomposed form of A matrix is given by:

Since we have the P and Q matrix, we can use the gradient descent approach to get their optimized versions. Let us build our recommendation engine using matrix factorization.

7. Building a recommendation engine using matrix factorization

Let us define a function to predict the ratings given by the user to all the movies which are not rated by him/her.

class MF():

    # Initializing the user-movie rating matrix, no. of latent features, alpha and beta.
    def __init__(self, R, K, alpha, beta, iterations):
        self.R = R
        self.num_users, self.num_items = R.shape
        self.K = K
        self.alpha = alpha
        self.beta = beta
        self.iterations = iterations

    # Initializing user-feature and movie-feature matrix 
    def train(self):
        self.P = np.random.normal(scale=1./self.K, size=(self.num_users, self.K))
        self.Q = np.random.normal(scale=1./self.K, size=(self.num_items, self.K))

        # Initializing the bias terms
        self.b_u = np.zeros(self.num_users)
        self.b_i = np.zeros(self.num_items)
        self.b = np.mean(self.R[np.where(self.R != 0)])

        # List of training samples
        self.samples = [
        (i, j, self.R[i, j])
        for i in range(self.num_users)
        for j in range(self.num_items)
        if self.R[i, j] > 0
        ]

        # Stochastic gradient descent for given number of iterations
        training_process = []
        for i in range(self.iterations):
        np.random.shuffle(self.samples)
        self.sgd()
        mse = self.mse()
        training_process.append((i, mse))
        if (i+1) % 20 == 0:
            print("Iteration: %d ; error = %.4f" % (i+1, mse))

        return training_process

    # Computing total mean squared error
    def mse(self):
        xs, ys = self.R.nonzero()
        predicted = self.full_matrix()
        error = 0
        for x, y in zip(xs, ys):
            error += pow(self.R[x, y] - predicted[x, y], 2)
        return np.sqrt(error)

    # Stochastic gradient descent to get optimized P and Q matrix
    def sgd(self):
        for i, j, r in self.samples:
            prediction = self.get_rating(i, j)
            e = (r - prediction)

            self.b_u[i] += self.alpha * (e - self.beta * self.b_u[i])
            self.b_i[j] += self.alpha * (e - self.beta * self.b_i[j])

            self.P[i, :] += self.alpha * (e * self.Q[j, :] - self.beta * self.P[i,:])
            self.Q[j, :] += self.alpha * (e * self.P[i, :] - self.beta * self.Q[j,:])

    # Ratings for user i and moive j
    def get_rating(self, i, j):
        prediction = self.b + self.b_u[i] + self.b_i[j] + self.P[i, :].dot(self.Q[j, :].T)
        return prediction

    # Full user-movie rating matrix
    def full_matrix(self):
        return mf.b + mf.b_u[:,np.newaxis] + mf.b_i[np.newaxis:,] + mf.P.dot(mf.Q.T)

Now we have a function that can predict the ratings. The input for this function are:

R – The user-movie rating matrix
K – Number of latent features
alpha – Learning rate for stochastic gradient descent
beta – Regularization parameter for bias
iterations – Number of iterations to perform stochastic gradient descent

We have to convert the user item ratings to matrix form. It can be done using the pivot function in python.

R= np.array(ratings.pivot(index = 'user_id', columns ='movie_id', values = 'rating').fillna(0))

fillna(0) will fill all the missing ratings with 0. Now we have the R matrix. We can initialize the number of latent features, but the number of these features must be less than or equal to the number of original features.

Now let us predict all the missing ratings. Let’s take K=20, alpha=0.001, beta=0.01 and iterations=100.

mf = MF(R, K=20, alpha=0.001, beta=0.01, iterations=100)
training_process = mf.train()
print()
print("P x Q:")
print(mf.full_matrix())
print()

This will give us the error value corresponding to every 20th iteration and finally the complete user-movie rating matrix. The output looks like this:

We have created our recommendation engine. Let’s focus on how to evaluate a recommendation engine in the next section.

8. Evaluation metrics for recommendation engines

For evaluating recommendation engines, we can use the following metrics

8.1 Recall:

What proportion of items that a user likes were actually recommended
It is given by:

Here tp represents the number of items recommended to a user that he/she likes and tp+fn represents the total items that a user likes
If a user likes 5 items and the recommendation engine decided to show 3 of them, then the recall will be 0.6
Larger the recall, better are the recommendations

8.2 Precision:

Out of all the recommended items, how many did the user actually like?
It is given by:

Here tp represents the number of items recommended to a user that he/she likes and tp+fp represents the total items recommended to a user
If 5 items were recommended to the user out of which he liked 4, then precision will be 0.8
Larger the precision, better the recommendations
But consider this case: If we simply recommend all the items, they will definitely cover the items which the user likes. So we have 100% recall! But think about precision for a second. If we recommend say 1000 items and user likes only 10 of them, then precision is 0.1%. This is really low. So, our aim should be to maximize both precision and recall.

8.3 RMSE (Root Mean Squared Error):

It measures the error in the predicted ratings:

Here, Predicted is the rating predicted by the model and Actual is the original rating
If a user has given a rating of 5 to a movie and we predicted the rating as 4, then RMSE is 1
Lesser the RMSE value, better the recommendations

The above metrics tell us how accurate our recommendations are but they do not focus on the order of recommendations, i.e. they do not focus on which product to recommend first and what follows after that. We need some metric that also considers the order of the products recommended. So, let’s look at some of the ranking metrics:

8.4 Mean Reciprocal Rank:

Evaluates the list of recommendations

Suppose we have recommended 3 movies to a user, say A, B, C in the given order, but the user only liked movie C. As the rank of movie C is 3, the reciprocal rank will be 1/3
Larger the mean reciprocal rank, better the recommendations

8.5 MAP at k (Mean Average Precision at cutoff k):

Precision and Recall don’t care about ordering in the recommendations
Precision at cutoff k is the precision calculated by considering only the subset of your recommendations from rank 1 through k

Suppose we have made three recommendations [0, 1, 1]. Here 0 means the recommendation is not correct while 1 means that the recommendation is correct. Then the precision at k will be [0, 1/2, 2/3], and the average precision will be (1/3)*(0+1/2+2/3) = 0.38
Larger the mean average precision, more correct will be the recommendations

8.6 NDCG (Normalized Discounted Cumulative Gain):

The main difference between MAP and NDCG is that MAP assumes that an item is either of interest (or not), while NDCG gives the relevance score
Let us understand it with an example: suppose out of 10 movies – A to J, we can recommend the first five movies, i.e. A, B, C, D and E while we must not recommend the other 5 movies, i.e., F, G, H, I and J. The recommendation was [A,B,C,D]. So the NDCG in this case will be 1 as the recommended products are relevant for the user
Higher the NDCG value, better the recommendations

9. What else can be tried?

Up to this point we have learnt what is a recommendation engine, its different types and their workings. Both content-based filtering and collaborative filtering algorithms have their strengths and weaknesses.

In some domains, generating a useful description of the content can be very difficult. A content-based filtering model will not select items if the user’s previous behavior does not provide evidence for this. Additional techniques have to be used so that the system can make suggestions outside the scope of what the user has already shown an interest in.

A collaborative filtering model doesn’t have these shortcomings. Because there is no need for a description of the items being recommended, the system can deal with any kind of information. Furthermore, it can recommend products which the user has not shown an interest in previously. But, collaborative filtering cannot provide recommendations for new items if there are no user ratings upon which to base a prediction. Even if users start rating the item, it will take some time before the item has received enough ratings in order to make accurate recommendations.

A system that combines content-based filtering and collaborative filtering could potentially take advantage from both the representation of the content as well as the similarities among users. One approach to combine collaborative and content-based filtering is to make predictions based on a weighted average of the content-based recommendations and the collaborative recommendations. Various means of doing so are:

Combining item scores
- In this approach, we combine the ratings obtained from both the filtering methods. The simplest way is to take the average of the ratings
- Suppose one method suggested a rating of 4 for a movie while the other suggested a rating of 5 for the same movie. So the final recommendation will be the average of both ratings, i.e. 4.5
- We can assign different weights to different methods as well
Combining item ranks:
- Suppose collaborative filtering recommended 5 movies A, B, C, D and E in the following order: A, B, C, D, E while content based filtering recommended them in the following order: B, D, A, C, E
- The rank for the movies will be:

Collaborative filtering

Movie	Rank
A	1
B	0.8
C	0.6
D	0.4
E	0.2

Content Based Filtering:

Movie	Rank
B	1
D	0.8
A	0.6
C	0.4
E	0.2

So, a hybrid recommender engine will combine these ranks and make final recommendations based on the combined rankings. The combined rank will be:

Movie	New Rank
A	1+0.6 = 1.6
B	0.8+1 = 1.8
C	0.6+0.4 = 1
D	0.4+0.8 = 1.2
E	0.2+0.2 = 0.4

The recommendations will be made based on these rankings. So, the final recommendations will look like this: B, A, D, C, E.

In this way, two or more techniques can be combined to build a hybrid recommendation engine and to improve their overall recommendation accuracy and power.

Conclusion

This was a very comprehensive article on recommendation engines. This tutorial should be good enough to get you started with this topic. We not only covered basic recommendation techniques but also saw how to implement some of the more advanced techniques available in the industry today.

We also covered some key facts associated with each technique. As somebody who wants to learn how to make a recommendation engine, I’d advise you to learn the techniques discussed in this tutorial and later implement them in your models.

Did you find this article useful? Share your opinions / views in the comments section below!

FAQs

Q1. Which algorithm is used for the recommendation system?

Recommendation systems use algorithms like collaborative filtering, content-based filtering, and hybrids to suggest items based on user preferences and item features.

Q2. Why are recommendation systems useful?

Recommendation systems are handy because they suggest things you like based on what you’ve enjoyed. They make finding cool stuff easy and keep you happy with personalized recommendations.

Pulkit Sharma 29 Nov 2023

My research interests lies in the field of Machine Learning and Deep Learning. Possess an enthusiasm for learning new skills and technologies.

Intermediate Machine Learning Project Python Recommendation

Responses From Readers

srinivas 22 Jun, 2018

hi , Thank you for the post. It's nice. Only one question i have why we are subtracting - 1 in the below code data_matrix[line[1]-1, line[2]-1] = line[3].

Show 1 reply

Pulkit Sharma 22 Jun, 2018

Hi srinivas, The starting index of any matrix or dataframe in Python is 0. So, we have subtracted 1 from these index to make the starting index as 0.

JNV 23 Jun, 2018

Very Detailed Article.. Thanks for Kind information

Show 1 reply

Pulkit Sharma 26 Jun, 2018

Hi JNV, Glad you found it useful.

Shan 24 Jun, 2018

Hi, Nice informative article. Please share why we are doing differently in the two scenarios: ratings_diff = (ratings - mean_user_rating[:, np.newaxis]) pred = mean_user_rating[:, np.newaxis] + similarity.dot(ratings_diff) / np.array([np.abs(similarity).sum(axis=1)]).T elif type == 'item': pred = ratings.dot(similarity) / np.array([np.abs(similarity).sum(axis=1)])

Show 1 reply

Pulkit Sharma 25 Jun, 2018

Hi Shan, Here we are making a recommendation engine based on user similarity and item similarity. So for each type of similarity we have one separate scenario, i.e. one for type='user' and another for type='item'. Hope this helps!!

Gianni 26 Jun, 2018

Great job, thank you !

Show 1 reply

Pulkit Sharma 26 Jun, 2018

Hi Gianni, Thanks for the feedback!!

Narayan 27 Jun, 2018

As per the definition of Explicit and Implicit data, these examples should be vice versa - 1. Netflix is collecting the data implicitly in the form of ratings given by user to different movies. - This is explicit data, here user is explicitly giving the rating for movies (Explicit data is information that is provided intentionally) 2. Order history of a user is recorded by Amazon which is an example of explicit mode of data collection - This is an example of implicit data collection (Implicit data is information that is not provided intentionally but gathered from available data streams like search history, clicks, order history, etc.)

Show 1 reply

Pulkit Sharma 27 Jun, 2018

Hi Narayan, Thanks for bringing this up. I have updated the same.

Shan 28 Jun, 2018

Hi Pulkit, Thanks for the reply. I was asking about difference in formula . pred = mean_user_rating[:, np.newaxis] + similarity.dot(ratings_diff) / np.array([np.abs(similarity).sum(axis=1)]).T elif type == ‘item’: pred = ratings.dot(similarity) / np.array([np.abs(similarity).sum(axis=1)])

Show 1 reply

Pulkit Sharma 28 Jun, 2018

Hi Shan, For user similarity, we have to take mean_user_rating while for item similarity we need mean_item_rating. So, the formula for each case is different.

VDK 03 Jul, 2018

if type == 'user': mean_user_rating = ratings.mean(axis=1) #We use np.newaxis so that mean_user_rating has same format as ratings ratings_diff = (ratings - mean_user_rating[:, np.newaxis]) pred = mean_user_rating[:, np.newaxis] + similarity.dot(ratings_diff) / np.array([np.abs(similarity).sum(axis=1)]).T elif type == 'item': pred = ratings.dot(similarity) / np.array([np.abs(similarity).sum(axis=1)]) return pred Can you please explain, what exactly is happening here. I am not able to comprehend it. Can you please help me visualize what the formulas are doing

Show 1 reply

Pulkit Sharma 03 Jul, 2018

Hi VDK, if type == ‘user’: Here we are checking whether the type of similarity is user or item mean_user_rating = ratings.mean(axis=1) Now we have calculated the mean user rating. axis=1 will take the mean of each column. ratings_diff = (ratings – mean_user_rating[:, np.newaxis]) We have normalized the ratings here by subtracting the mean user rating pred = mean_user_rating[:, np.newaxis] + similarity.dot(ratings_diff) / np.array([np.abs(similarity).sum(axis=1)]).T Now the predictions are made based on the similarity between users, i.e. similarity between the users based on the ratings given by them to same movies. Similarly, elif type == ‘item’: pred = ratings.dot(similarity) / np.array([np.abs(similarity).sum(axis=1)]) Here we have calculated the predictions based on the item similarity. return pred Finally, we have returned the predictions. Hope this helps!!

Eduardo 03 Jul, 2018

Great job! Thank you for the excelent article.

Show 1 reply

Pulkit Sharma 03 Jul, 2018

Hi Eduardo, Glad you found this article useful.

Shubham Goyal 27 Jul, 2018

"The eigenvalues of AAT will give us the P matrix and the eigenvalues of ATA will give us the Q matrix. Σ is the square root of the eigenvalues from AAT or ATA." I didn't understand how eigen values are used, Can you give some reference to read deeper in this topic? Also, when we are doing Collaborative filtering we are not using any other features available to us, viz demographics of the user, genres of the movie, timestamp etc, won't those affecct the predictions? I was also curious about to know that is turicreate and graphlabs are used in the industry or recommendation engines are created mannually?

Show 1 reply

Pulkit Sharma 30 Jul, 2018

Hi Shubham, You can refer this paper to learn more on how eigenvalues are used to get P and Q matrix. Regarding collaborative filtering, I have used just one feature to make the concept clear. You surely can use more and more features available to you to make the predictions more effective and personalized. Turicreate is one of the most commonly used tool in the industry. Along with this, many companies have made their own recommendation engines as well.

Lilya 05 Aug, 2018

Hi, thank you for your kind information, but I have one question that in section 4 "Building collaborative filtering model from scratch", the result "user_prediction" and "item_prediction" you obtained are just matrix about the similarity of user and item, so how can I predict the rating that a user will give for a item he didn't rate before?

Show 1 reply

Pulkit Sharma 06 Aug, 2018

Hi Lilya, From user_prediction, you will get similar users to a specific user. Now you want to predict the rating that he/she will give for a item he/she didn't rate before. So, first of all you will find the most similar user from the user_predictions. Now you will look at the ratings given to that item by the similar users. Now, you will consider that as the users are similar, they will rate the movies in a similar manner. So, you will predict the similar rating as given by the similar users to that particular item. Similarly you can make use of item_prediction to predict the rating based on similar items. Hope this helps!!

Pramodkumar Verma 10 Aug, 2018

Hi Pulkit, Nice informative article. I am facing an issue, when the value in the user similarity matrix is low then then we consider the user to be close to each other, Please confirm. But when we are looking at the predicted rating, higher the value higher is the chance of recommending the product. pred = mean_user_rating[:, np.newaxis] + similarity.dot(ratings_diff) / np.array([np.abs(similarity).sum(axis=1)]).T Here if similarity high(means value low), will reduce the value of predicted value. Ultimately the Prediction and similarity are moving in opposite direction. Please help us to understand. One more thing can we use cosine_distances instead of pairwise_similarity in the approach, so the direction issue is resolved?? If you can provide your contact number that will be great. Regards. Pramod

Show 1 reply

Pulkit Sharma 10 Aug, 2018

Hi Pramod, The similarity value indicates how similar two users/items are. So, if the similarity value is more, the similarity between the users/items is more. We are using cosine metric to calculate the pairwise distance. Feel free to contact me at [email protected]

Shera 15 Aug, 2018

Hey detailed, thank you so much for the article. It is very helpful and of really good quality. Really appreciate your efforts.

Show 1 reply

Pulkit Sharma 16 Aug, 2018

Hi Shera, I am glad you found it useful.

Lava kumar 02 Sep, 2018

Hi Pulkit, Very nice information. One small doubt how to recommend the movie_id to new user eg: Cold start Eg: if new user_id entered whose user_id='1234' So, how we will recommend the movie_id's to that user Regards, Lavakumar

Show 1 reply

Pulkit Sharma 03 Sep, 2018

Hi, One basic approach could be to apply a popularity based strategy, i.e. recommend the most popular movies. These can be determined by what has been popular recently overall or regionally. Once we know the preferences of the user, further recommending movies will be easier.

RAJVIJAY 02 Sep, 2018

Excellent article Pulkit. So the prediction matrices give us the likelihood of each movies being recommended to each user? What info do you think does this heatmap convey ? https://ibb.co/m9w3FK

Show 1 reply

Pulkit Sharma 03 Sep, 2018

Hi, Prediction matrices, based on the similarity between users and items, gives us the chances of recommending items to users. As per my inference, these heat maps tells us the probability of recommending each movie to a user. More the probability, higher the chances of recommending.

Smriti S 03 Sep, 2018

I am mid-way in the article and realised that Turicreate package can't be directly installed on Windows 10. Is there an alternative to turicreate? I know that CreateML is similar but doesn't work cross-platform.

Show 1 reply

Pulkit Sharma 03 Sep, 2018

Hi Smriti, You can refer this link.

Ishaan Jain 13 Sep, 2018

Hey Pulkit, thanks for this nice informative article. I just wanted to know the purpose of iterating through ratings after constructing a data matrix of users and items in the collaborative filtering model from scratch part(4th one). Hope you understood my question. Seeking early reply from your side. Thanks in advance.

Show 1 reply

Pulkit Sharma 13 Sep, 2018

Hi Ishaan, We want to calculate the similarity of a user with all of the other users as well as the similarity of an item with all of the other items. This we have to do for each user and each item. So, we have to iterate over all the users and items and for that I have used for loops. Hope this helps!

Ellie Lan 25 Sep, 2018

It seems like the code trains on the entire dataset. Shouldn't we only train on the training set and then test on the test set? I understand the code you provided doesn't evaluate performance so perhaps that's why you decided to train on the entire data matrix and not just the train portion.

Show 1 reply

Pulkit Sharma 26 Sep, 2018

Hi Ellie, In our case, there is no such training or testing data. We are training on the entire known ratings given by all the users and then recommending them other movies which they can watch.

Remote 26 Sep, 2018

Thank you for your article! Have you considered a Mircosoft Excel recommender system?

Vighnesh S 15 Oct, 2018

user_prediction and item_prediction. How should I interpret these results? They do not seem to have a fixed range.

Show 1 reply

Pulkit Sharma 15 Oct, 2018

Hi Vignesh, Higher these values, more the chances of recommendations. There is no fixed range, but we can take the maximum values to make recommendations.

Jing Li 17 Oct, 2018

Hi Thank you for this detailed blog. My be there is something wrong here. mf.b_i[np.newaxis:,] should be mf.b_i[np.newaxis, :]

Ruhin Shaikh 17 Oct, 2018

Hi Pulkit, Great explanation .When you print the the user_prediction you get an array of predictions? What to do if you want to predict for just one particular user?

Show 1 reply

Pulkit Sharma 17 Oct, 2018

Hi Ruhin, user_prediction have predictions for all the users. To print the predictions of one user, you can use user_prediction[0] that will print the predictions for the first user.

Tina 25 Oct, 2018

After I sorted the user_prediction to see which items are recommended according to each user, I saw that user_prediction suggests most of the items that user already rated (I mean the items whose similarities are highest are the ones users have already rated), so we need to filter out all of the items that user has rated in user_prediction before actually recommending?

Show 1 reply

Pulkit Sharma 25 Oct, 2018

Hi Tina, We get the predictions for all the items no matter they have rated them or not. So, finally while recommending we can filter out the items that the user has already rated and recommend the other items with more similarity.

Jingmiao Shen 07 Nov, 2018

Hey man, such a great blog about Recommendation Engine!!! Very good, "concept" + "Code", easy to follow!!! One question, I cannot install the "turicreate" in my laptop(Win10), It seems like this package is not supported in Windows?

Show 1 reply

Pulkit Sharma 12 Nov, 2018

Hi, I think Turicreate package can only be installed natively on Linux and macOS. The only way to use it on Windows is through the Windows Subsystem for Linux, and for that, you need to install Anaconda or Python on it.

sushant borade 17 Nov, 2018

how to test the model for one specific user?plzz help me with code

Show 1 reply

Pulkit Sharma 17 Nov, 2018

Hi sushant, You can use this code to get prediction for single user: popularity_recomm = popularity_model.recommend(users=[1],k=5) Here, you can give the ID of user for whom the recommendations are to be made in users variable and the number of recommendations that you want in k variable.

sushant 19 Nov, 2018

hiii i want to test the the model for one user in 4. Building collaborative filtering model from scratch the 1st one please provide the code alos

Show 1 reply

Pulkit Sharma 19 Nov, 2018

Hi Sushant, In this code: user_prediction = predict(data_matrix, user_similarity, type='user') item_prediction = predict(data_matrix, item_similarity, type='item') Currently, the data_matrix contains the details of all the users. If you want to predict only for single user, change the data_matrix such that it contains only the details of single user.

Mayur Eknath Narvekar 19 Nov, 2018

Hi Sir, I need to test the model for one specific user where you have used similarity concept 4. Building collaborative filtering model from scratch - Please provide the code

Show 1 reply

Pulkit Sharma 19 Nov, 2018

Hi Mayur, In this code: user_prediction = predict(data_matrix, user_similarity, type=’user’) item_prediction = predict(data_matrix, item_similarity, type=’item’) Currently, the data_matrix contains the details of all the users. If you want to predict only for single user, change the data_matrix such that it contains only the details of single user.

mayur 19 Nov, 2018

sir on finding user_prediction and item_prediction by taking data_matrix which is having only one user i get an array but how will i suggest the actual movie id to a user please also provide the code for the same

Show 1 reply

Pulkit Sharma 19 Nov, 2018

Hi mayur, Can you print the array which you get as user_prediction and item_prediction for single user and share the screenshot with me? That would help me to clarify your doubt in a better way.

kranthi 10 Dec, 2018

Hiii how could i proceed user - user and user-item similarity for recommendation engine

Show 1 reply

Pulkit Sharma 12 Dec, 2018

Hi Kranthi, These similarities tells you how similar the users and items are. So, for a particular user, you first check which are the most similar users and then based on that predict the items that are liked by the similar user. Similarly, we can use item-item similarity.

Pranesh Krishnamurthy 22 Dec, 2018

Hello! Thanks for the great article. Extending your example, we have information about items and users too right? So, why can't we use regression models to get the predictions for the ratings?

Show 1 reply

Pulkit Sharma 24 Dec, 2018

Hi, We can use regression models, but the interpretability of those models will not be good. These recommendation models are good at finding relation between users and items. You can always try different models and share your results here.

Bernardo 06 Jan, 2019

Comprehensive and well structured. Thank you for sharing Pulkit.

Show 1 reply

Pulkit Sharma 07 Jan, 2019

Thank you Bernardo!

John Robert 07 Jan, 2019

Hello Pulkit, I want to thank you for the article it really helped me to understand the concept well. Please, I will like you to explain the mse method in the MF class, I see you are calculating the square root of the sum of squares of errors, you didnt calculate the mean, that is divide it by the total number of users. Please, can you confirm if I am right or I missed something?

Show 1 reply

Pulkit Sharma 07 Jan, 2019

Hi John, In the code, I have used squared error instead of mean squared error. You can also try mean squared error and share your results.

B Suvarna 06 Feb, 2019

Thanks a lot for sharing such a nice article

Arnold Rosielle 17 Nov, 2022

Great piece - a nice intro to recommendation analysis. Thanks! I got as far as trying to import turicreate but it does not seem to be available for python3.9.

Comprehensive Guide to build a Recommendation Engine from scratch (in Python)

Introduction

Project to build your Recommendation Engine

Problem Statement

Table of contents

1. What are recommendation engines?

2. How does a recommendation engine work?

2.1 Data collection

2.2 Data storage

2.3 Filtering the data

2.3.1 Content based filtering

2.3.2 Collaborative filtering

User-User collaborative filtering

Item-Item collaborative filtering

3. Case study in Python using the MovieLens Dataset

4. Building collaborative filtering model from scratch

5. Building a simple popularity and collaborative filtering model using Turicreate

6. Introduction to matrix factorization

7. Building a recommendation engine using matrix factorization

8. Evaluation metrics for recommendation engines

9. What else can be tried?

Conclusion

FAQs

Frequently Asked Questions

Responses From Readers

Write for us

Machine Learning

Comprehensive Guide to build a Recommendation Engine from scratch (in Python)

Introduction

Project to build your Recommendation Engine

Problem Statement

Table of contents

1. What are recommendation engines?

2. How does a recommendation engine work?

2.1 Data collection

2.2 Data storage

2.3 Filtering the data

2.3.1 Content based filtering

2.3.2 Collaborative filtering

User-User collaborative filtering

Item-Item collaborative filtering

3. Case study in Python using the MovieLens Dataset

4. Building collaborative filtering model from scratch

5. Building a simple popularity and collaborative filtering model using Turicreate

6. Introduction to matrix factorization

7. Building a recommendation engine using matrix factorization

8. Evaluation metrics for recommendation engines

9. What else can be tried?

Conclusion

FAQs

Frequently Asked Questions

Responses From Readers

Write for us

Machine Learning

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

NaÃ¯ve Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices