In today’s world, every customer faces multiple choices, such as finding a book to read without a specific idea in mind, leading to time-consuming searches and reliance on recommendations from others. However, a recommendation engine could streamline this process by suggesting books based on previous reads, saving time and enhancing the user experience. Recommendation engines, widely used by businesses like Amazon, Netflix, Google, and Goodreads, leverage machine learning to provide personalized suggestions. This article explores various recommendation engine algorithms, the mathematics behind them, and demonstrates creating a recommendation engine using matrix factorization in Python.
Project to build your Recommendation EngineProblem StatementMany online businesses rely on customer reviews and ratings. Explicit feedback is especially important in the entertainment and ecommerce industry where all customer engagements are impacted by these ratings. Netflix relies on such rating data to power its recommendation engine to provide the best movie and TV series recommendations that are personalized and most relevant to the user. This practice problem challenges the participants to predict the ratings for jokes given by the users provided the ratings provided by the same users for another set of jokes. This dataset is taken from the famous jester online Joke Recommender system dataset. |
Till recently, people generally tended to buy products recommended to them by their friends or the people they trust. This used to be the primary method of purchase when there was any doubt about the product. But with the advent of the digital age, that circle has expanded to include online sites that utilize some sort of recommendation engine.
A recommendation engine filters the data using different algorithms and recommends the most relevant items to users. It first captures the past behavior of a customer and based on that, recommends products which the users might be likely to buy.
If a completely new user visits an e-commerce site, that site will not have any past history of that user. So how does the site go about recommending products to the user in such a scenario? One possible solution could be to recommend the best selling products, i.e. the products which are high in demand. Another possible solution could be to recommend the products which would bring the maximum profit to the business.
If we can recommend a few items to a customer based on their needs and interests, it will create a positive impact on the user experience and lead to frequent visits. Hence, businesses nowadays are building smart and intelligent recommendation engines by studying the past behavior of their users.
Now that we have an intuition of recommendation engines, let’s now look at how they work.
Before we deep dive into this topic, first we’ll think of how we can recommend items to users:
Both of the above methods have their drawbacks. In the first case, the most popular items would be the same for each user so everybody will see the same recommendations. While in the second case, as the number of users increases, the number of features will also increase. So classifying the users into various segments will be a very difficult task.
The main problem here is that we are unable to tailor recommendations based on the specific interest of the users. It’s like Amazon is recommending you buy a laptop just because it’s been bought by the majority of the shoppers. But thankfully, Amazon (or any other big firm) does not recommend products using the above mentioned approach. They use some personalized methods which help them in recommending products more accurately.
Let’s now focus on how a recommendation engine works by going through the following steps.
This is the first and most crucial step for building a recommendation engine. The data can be collected by two means: explicitly and implicitly. Explicit data is information that is provided intentionally, i.e. input from the users such as movie ratings. Implicit data is information that is not provided intentionally but gathered from available data streams like search history, clicks, order history, etc.
In the above image, Netflix is collecting the data explicitly in the form of ratings given by user to different movies.
Here the order history of a user is recorded by Amazon which is an example of implicit mode of data collection.
The amount of data dictates how good the recommendations of the model can get. For example, in a movie recommendation system, the more ratings users give to movies, the better the recommendations get for other users. The type of data plays an important role in deciding the type of storage that has to be used. This type of storage could include a standard SQL database, a NoSQL database or some kind of object storage.
After collecting and storing the data, we have to filter it so as to extract the relevant information required to make the final recommendations.
There are various algorithms that help us make the filtering process easier. In the next section, we will go through each algorithm in detail.
This algorithm recommends products which are similar to the ones that a user has liked in the past.
For example, if a person has liked the movie “Inception”, then this algorithm will recommend movies that fall under the same genre. But how does the algorithm understand which genre to pick and recommend movies from?
Recommendation engines save all information related to each user in a vector form known as the profile vector, which contains the user’s past behavior, including liked or disliked movies and given ratings. Information about movies is stored in another vector called the item vector, which includes details such as genre, cast, and director. The content-based filtering algorithm uses cosine similarity to find the cosine of the angle between the profile vector and the item vector. If A is the profile vector and B is the item vector, the similarity between them can be calculated as the cosine of the angle between these two vectors.
Based on the cosine value, which ranges between -1 to 1, the movies are arranged in descending order and one of the two below approaches is used for recommendations:
Other methods that can be used to calculate the similarity are:
The algorithm’s main flaw is its narrow recommendation of items of the same type, never recommending products the user hasn’t previously purchased or liked. To improve, an algorithm should consider user behavior in recommendation.
Let us understand this with an example. If person A likes 3 movies, say Interstellar, Inception and Predestination, and person B likes Inception, Predestination and The Prestige, then they have almost similar interests. We can say with some certainty that A should like The Prestige and B should like Interstellar. The collaborative filtering algorithm uses “User Behavior” for recommending items. This is one of the most commonly used algorithms in the industry as it is not dependent on any additional information. There are different types of collaborating filtering techniques and we shall look at them in detail below.
This algorithm first finds the similarity score between users. Based on this similarity score, it then picks out the most similar users and recommends products which these similar users have liked or bought previously.
In terms of our movies example from earlier, this algorithm finds the similarity between each user based on the ratings they have previously given to different movies. The prediction of an item for a user u is calculated by computing the weighted sum of the user ratings given by other users to an item i.
The prediction Pu,i is given by:
Here,
Now, we have the ratings for users in profile vector and based on that we have to predict the ratings for other users. Following steps are followed to do so:
User/Movie | x1 | x2 | x3 | x4 | x5 | Mean User Rating |
A | 4 | 1 | – | 4 | – | 3 |
B | – | 4 | – | 2 | 3 | 3 |
C | – | 1 | – | 4 | 4 | 3 |
Here we have a user movie rating matrix. To understand this in a more practical manner, let’s find the similarity between users (A, C) and (B, C) in the above table. Common movies rated by A/[ and C are movies x2 and x4 and by B and C are movies x2, x4 and x5.
The correlation between user A and C is more than the correlation between B and C. Hence users A and C have more similarity and the movies liked by user A will be recommended to user C and vice versa.
This algorithm is quite time consuming as it involves calculating the similarity for each user and then calculating prediction for each similarity score. One way of handling this problem is to select only a few users (neighbors) instead of all to make predictions, i.e. instead of making predictions for all similarity values, we choose only few similarity values. There are various ways to select the neighbors:
This algorithm is useful when the number of users is less. Its not effective when there are a large number of users as it will take a lot of time to compute the similarity between all user pairs. This leads us to item-item collaborative filtering, which is effective when the number of users is more than the items being recommended.
In this algorithm, we compute the similarity between each pair of items.
The algorithm aims to find similarity between movie pairs and recommend similar ones based on user-user collaborative filtering. It uses the weighted sum of ratings of “item-neighbors” instead of “user-neighbors” and provides predictions based on user-friendliness.
Now, as we have the similarity between each movie and the ratings, predictions are made and based on those predictions, similar movies are recommended. Let us understand it with an example.
User/Movie | x1 | x2 | x3 | x4 | x5 |
A | 4 | 1 | 2 | 4 | 4 |
B | 2 | 4 | 4 | 2 | 1 |
C | – | 1 | – | 3 | 4 |
Mean Item Rating | 3 | 2 | 3 | 3 | 3 |
The mean item rating is the average of all ratings given to a particular item, compared to the user-user filtering table. Instead of finding user-user similarity, item-item similarity is calculated. For example, comparing movies (x1, x4) and (x1, x5), common users who have rated these items are A and B, while those who have rated movies x1 and x5 are also A and B.
The similarity between movie x1 and x4 is more than the similarity between movie x1 and x5. So based on these similarity values, if any user searches for movie x1, they will be recommended movie x4 and vice versa. Before going further and implementing these concepts, there is a question which we must know the answer to – what will happen if a new user or a new item is added in the dataset? It is called a Cold Start.
Visitor Cold Start occurs when a new user is introduced to the dataset without any history, making it difficult to recommend products. A popular-based strategy can be applied to recommend the most popular products, based on recent popularity or regional trends. Product Cold Start occurs when a new product is launched or added to the system, and user action is crucial for determining its value. Content-based filtering can be used to solve this problem, using the content of the new product for recommendations and user actions on it.
We will work on the MovieLens dataset and build a model to recommend movies to the end users. This data has been collected by the GroupLens Research Project at the University of Minnesota. The dataset can be downloaded from here. This dataset consists of:
First, we’ll import our standard libraries and read the dataset in Python. Here is a live coding window to get you started. You can run the codes and get the output in this window itself:
This dataset contains attributes of 1682 movies. There are 24 columns out of which last 19 columns specify the genre of a particular movie. These are binary columns, i.e., a value of 1 denotes that the movie belongs to that genre, and 0 otherwise.
The dataset has already been divided into train and test by GroupLens where the test data has 10 ratings for each user, i.e. 9,430 rows in total. We will read both these files into our Python environment.
r_cols = ['user_id', 'movie_id', 'rating', 'unix_timestamp']
ratings_train = pd.read_csv('ml-100k/ua.base', sep='\t', names=r_cols, encoding='latin-1')
ratings_test = pd.read_csv('ml-100k/ua.test', sep='\t', names=r_cols, encoding='latin-1')
ratings_train.shape, ratings_test.shape
It’s finally time to build our recommend engine!
We will recommend movies based on user-user similarity and item-item similarity. For that, first we need to calculate the number of unique users and movies.
n_users = ratings.user_id.unique().shape[0]
n_items = ratings.movie_id.unique().shape[0]
Now, we will create a user-item matrix which can be used to calculate the similarity between users and items.
data_matrix = np.zeros((n_users, n_items))
for line in ratings.itertuples():
data_matrix[line[1]-1, line[2]-1] = line[3]
Now, we will calculate the similarity. We can use the pairwise_distance function from sklearn to calculate the cosine similarity.
from sklearn.metrics.pairwise import pairwise_distances
user_similarity = pairwise_distances(data_matrix, metric='cosine')
item_similarity = pairwise_distances(data_matrix.T, metric='cosine')
This gives us the item-item and user-user similarity in an array form. The next step is to make predictions based on these similarities. Let’s define a function to do just that.
def predict(ratings, similarity, type='user'):
if type == 'user':
mean_user_rating = ratings.mean(axis=1)
#We use np.newaxis so that mean_user_rating has same format as ratings
ratings_diff = (ratings - mean_user_rating[:, np.newaxis])
pred = mean_user_rating[:, np.newaxis] + similarity.dot(ratings_diff) / np.array([np.abs(similarity).sum(axis=1)]).T
elif type == 'item':
pred = ratings.dot(similarity) / np.array([np.abs(similarity).sum(axis=1)])
return pred
Finally, we will make predictions based on user similarity and item similarity.
user_prediction = predict(data_matrix, user_similarity, type='user')
item_prediction = predict(data_matrix, item_similarity, type='item')
As it turns out, we also have a library which generates all these recommendations automatically. Let us now learn how to create a recommendation engine using turicreate in Python. To get familiar with turicreate and to install it on your machine, refer here.
After installing turicreate, first let’s import it and read the train and test dataset in our environment. Since we will be using turicreate, we will need to convert the dataset in SFrames.
import turicreate
train_data = turicreate.SFrame(ratings_train)
test_data = turicreate.Sframe(ratings_test)
We have user behavior as well as attributes of the users and movies, so we can make content based as well as collaborative filtering algorithms. We will start with a simple popularity model and then build a collaborative filtering model.
First we’ll build a model which will recommend movies based on the most popular choices, i.e., a model where all the users receive the same recommendation(s). We will use the turicreate recommender function popularity_recommender for this.
popularity_model = turicreate.popularity_recommender.create(train_data, user_id='user_id', item_id='movie_id', target='rating')
It’s prediction time! We will recommend the top 5 items for the first 5 users in our dataset.
popularity_recomm = popularity_model.recommend(users=[1,2,3,4,5],k=5)
popularity_recomm.print_rows(num_rows=25)
Note that the recommendations for all users are the same – 1467, 1201, 1189, 1122, 814. And they’re all in the same order! This confirms that all the recommended movies have an average rating of 5, i.e. all the users who watched the movie gave it a top rating. Thus our popularity system works as expected.
After building a popularity model, we will now build a collaborative filtering model. Let’s train the item similarity model and make top 5 recommendations for the first 5 users.
#Training the model
item_sim_model = turicreate.item_similarity_recommender.create(train_data, user_id='user_id', item_id='movie_id', target='rating', similarity_type='cosine')
#Making recommendations
item_sim_recomm = item_sim_model.recommend(users=[1,2,3,4,5],k=5)
item_sim_recomm.print_rows(num_rows=25)
Here we can see that the recommendations (movie_id) are different for each user. So personalization exists, i.e. for different users we have a different set of recommendations.
In this model, we do not have the ratings for each movie given by each user. We must find a way to predict all these missing ratings. For that, we have to find a set of features which can define how a user rates the movies. These are called latent features. We need to find a way to extract the most important latent features from the the existing features. Matrix factorization, covered in the next section, is one such technique which uses the lower dimension dense matrix and helps in extracting the important latent features.
Let’s understand matrix factorization with an example. Consider a user-movie ratings matrix (1-5) given by different users to different movies.
Here user_id is the unique ID of different users and each movie is also assigned a unique ID. A rating of 0.0 represents that the user has not rated that particular movie (1 is the lowest rating a user can give). We want to predict these missing ratings. Using matrix factorization, we can find some latent features that can determine how a user rates a movie. We decompose the matrix into constituent parts in such a way that the product of these parts generates the original matrix.
Let us assume that we have to find k latent features. So we can divide our rating matrix R(MxN) into P(MxK) and Q(NxK) such that P x QT (here QT is the transpose of Q matrix) approximates the R matrix:
Choosing the latent features through matrix factorization removes the noise from the data. How? Well, it removes the feature(s) which does not determine how a user rates a movie. Now to get the rating rui for a movie qik rated by a user puk across all the latent features k, we can calculate the dot product of the 2 vectors and add them to get the ratings based on all the latent features.
This is how matrix factorization gives us the ratings for the movies which have not been rated by the users. But how can we add new data to our user-movie rating matrix, i.e. if a new user joins and rates a movie, how will we add this data to our pre-existing matrix?
Let me make it easier for you through the matrix factorization method. If a new user joins the system, there will be no change in the diagonal feature weight matrix Σ, as well as the item-feature relevance matrix Q. The only change will occur in the user-feature affinity matrix P. We can apply some matrix multiplication methods to do that.
We have,
Let’s multiply with Q on both sides.
Now, we have
So,
Simplifying it further, we can get the P matrix:
This is the updated user-feature affinity matrix. Similarly, if a new movie is added to the system, we can follow similar steps to get the updated item-feature relevance matrix Q.
Remember, we decomposed R matrix into P and Q. But how do we decide which P and Q matrix will approximate the R matrix? We can use the gradient descent algorithm for doing this. The objective here is to minimize the squared error between the actual rating and the one estimated using P and Q. The squared error is given by:
Our aim was to decide the p and q value in such a way that this error is minimized. We need to update the p and q values so as to get the optimized values of these matrices which will give the least error. Now we will define an update rule for puk and qki. The update rule in gradient descent is defined by the gradient of the error to be minimized.
As we now have the gradients, we can apply the update rule for puk and qki.
Here α is the learning rate which decides the size of each update. The above updates can be repeated until the error is minimized. Once that’s done, we get the optimal P and Q matrix which can be used to predict the ratings. Let us quickly recap how this algorithm works and then we will build the recommendation engine to predict the ratings for the unrated movies.
# for f = 1,2,....,k :
# for rui ε R :
# predict rui
# update puk and qki
So based on each latent feature, all the missing ratings in the R matrix will be filled using the predicted rui value. Then puk and qki are updated using gradient descent and their optimal value is obtained. It can be visualized as shown below:
Now that we have understood the inner workings of this algorithm, we’ll take an example and see how a matrix is factorized into its constituents.
Consider a 2 X 3 matrix, A2X3 as shown below:
Here we have 2 users and their corresponding ratings for 3 movies. Now, we will decompose this matrix into sub parts, such that:
The eigenvalues of AAT will give us the P matrix and the eigenvalues of ATA will give us the Q matrix. Σ is the square root of the eigenvalues from AAT or ATA.
So, the eigenvalues of AAT are 25, 9. Similarly, we can calculate the eigenvalues of ATA. These values will be 25, 9, 0. Now we have to calculate the corresponding eigenvectors for AAT and ATA.
For λ = 25, we have:
It can be row reduced to:
A unit-length vector in the kernel of that matrix is:
Similarly, for λ = 9 we have:
It can be row reduced to:
A unit-length vector in the kernel of that matrix is:
For the last eigenvector, we could find a unit vector perpendicular to q1 and q2. So,
Σ2X3 matrix is the square root of eigenvalues of AAT or ATA, i.e. 25 and 9.
Finally, we can compute P2X2 by the formula σpi = Aqi, or pi = 1/σ(Aqi). This gives:
So, the decomposed form of A matrix is given by:
Since we have the P and Q matrix, we can use the gradient descent approach to get their optimized versions. Let us build our recommendation engine using matrix factorization.
Let us define a function to predict the ratings given by the user to all the movies which are not rated by him/her.
class MF():
# Initializing the user-movie rating matrix, no. of latent features, alpha and beta.
def __init__(self, R, K, alpha, beta, iterations):
self.R = R
self.num_users, self.num_items = R.shape
self.K = K
self.alpha = alpha
self.beta = beta
self.iterations = iterations
# Initializing user-feature and movie-feature matrix
def train(self):
self.P = np.random.normal(scale=1./self.K, size=(self.num_users, self.K))
self.Q = np.random.normal(scale=1./self.K, size=(self.num_items, self.K))
# Initializing the bias terms
self.b_u = np.zeros(self.num_users)
self.b_i = np.zeros(self.num_items)
self.b = np.mean(self.R[np.where(self.R != 0)])
# List of training samples
self.samples = [
(i, j, self.R[i, j])
for i in range(self.num_users)
for j in range(self.num_items)
if self.R[i, j] > 0
]
# Stochastic gradient descent for given number of iterations
training_process = []
for i in range(self.iterations):
np.random.shuffle(self.samples)
self.sgd()
mse = self.mse()
training_process.append((i, mse))
if (i+1) % 20 == 0:
print("Iteration: %d ; error = %.4f" % (i+1, mse))
return training_process
# Computing total mean squared error
def mse(self):
xs, ys = self.R.nonzero()
predicted = self.full_matrix()
error = 0
for x, y in zip(xs, ys):
error += pow(self.R[x, y] - predicted[x, y], 2)
return np.sqrt(error)
# Stochastic gradient descent to get optimized P and Q matrix
def sgd(self):
for i, j, r in self.samples:
prediction = self.get_rating(i, j)
e = (r - prediction)
self.b_u[i] += self.alpha * (e - self.beta * self.b_u[i])
self.b_i[j] += self.alpha * (e - self.beta * self.b_i[j])
self.P[i, :] += self.alpha * (e * self.Q[j, :] - self.beta * self.P[i,:])
self.Q[j, :] += self.alpha * (e * self.P[i, :] - self.beta * self.Q[j,:])
# Ratings for user i and moive j
def get_rating(self, i, j):
prediction = self.b + self.b_u[i] + self.b_i[j] + self.P[i, :].dot(self.Q[j, :].T)
return prediction
# Full user-movie rating matrix
def full_matrix(self):
return mf.b + mf.b_u[:,np.newaxis] + mf.b_i[np.newaxis:,] + mf.P.dot(mf.Q.T)
Now we have a function that can predict the ratings. The input for this function are:
We have to convert the user item ratings to matrix form. It can be done using the pivot function in python.
R= np.array(ratings.pivot(index = 'user_id', columns ='movie_id', values = 'rating').fillna(0))
fillna(0) will fill all the missing ratings with 0. Now we have the R matrix. We can initialize the number of latent features, but the number of these features must be less than or equal to the number of original features.
Now let us predict all the missing ratings. Let’s take K=20, alpha=0.001, beta=0.01 and iterations=100.
mf = MF(R, K=20, alpha=0.001, beta=0.01, iterations=100)
training_process = mf.train()
print()
print("P x Q:")
print(mf.full_matrix())
print()
This will give us the error value corresponding to every 20th iteration and finally the complete user-movie rating matrix. The output looks like this:
We have created our recommendation engine. Let’s focus on how to evaluate a recommendation engine in the next section.
For evaluating recommendation engines, we can use the following metrics
The above metrics tell us how accurate our recommendations are but they do not focus on the order of recommendations, i.e. they do not focus on which product to recommend first and what follows after that. We need some metric that also considers the order of the products recommended. So, let’s look at some of the ranking metrics:
A recommendation engine is a tool that uses content-based and collaborative filtering algorithms to find and recommend content. Both types have their strengths and weaknesses, but content-based filters may not select items based on user behavior, necessitating the use of additional techniques to make suggestions beyond what the user has already shown interest in.
A collaborative filtering model is versatile and can handle various types of information without requiring a description of the items being recommended. It can also recommend products not previously shown interest to users. However, it cannot provide new item recommendations without user ratings, and it may take time for accurate recommendations.
A system that combines content-based and collaborative filtering can leverage content representation and user similarities. Predictions can be made using a weighted average of both, using various methods.
Movie | Rank |
A | 1 |
B | 0.8 |
C | 0.6 |
D | 0.4 |
E | 0.2 |
Movie | Rank |
B | 1 |
D | 0.8 |
A | 0.6 |
C | 0.4 |
E | 0.2 |
So, a hybrid recommender engine will combine these ranks and make final recommendations based on the combined rankings. The combined rank will be:
Movie | New Rank |
A | 1+0.6 = 1.6 |
B | 0.8+1 = 1.8 |
C | 0.6+0.4 = 1 |
D | 0.4+0.8 = 1.2 |
E | 0.2+0.2 = 0.4 |
The recommendations will be made based on these rankings. So, the final recommendations will look like this: B, A, D, C, E.
In this way, two or more techniques can be combined to build a hybrid recommendation engine and to improve their overall recommendation accuracy and power.
Recommendation engines, powered by artificial intelligence and advanced machine learning models, play a crucial role in modern data science, offering personalized suggestions that significantly enhance user experience. These systems, encompassing various types of recommendation systems such as collaborative filtering approaches and content-based recommender systems, address the complex challenge of predicting user preferences. However, issues like the cold start problem and handling user dislikes remain critical hurdles. Techniques like embeddings and sophisticated algorithms continue to evolve, striving to refine the accuracy and efficiency of recommendations. The ongoing advancements in this field promise to further revolutionize how we discover and interact with content, making the exploration process more intuitive and personalized.
Recommendation systems use algorithms like collaborative filtering, content-based filtering, and hybrids to suggest items based on user preferences and item features.
Recommendation systems are handy because they suggest things you like based on what you’ve enjoyed. They make finding cool stuff easy and keep you happy with personalized recommendations.