Quick Guide to Build a Recommendation Engine in Python & R

Aarshay Jain 29 Jul, 2022 • 12 min read

Overview

Deep dive into the concept of recommendation engine in python
Building a recommendation system in python using the graphlab library
Explanation of the different types of recommendation engines

Introduction

This could help you in building your first project!

Be it a fresher or an experienced professional in data science, doing voluntary projects always adds to one’s candidature. My sole reason behind writing this article is to get your started with recommendation systems so that you can build one. If you struggle to get open data, write to me in comments.

Recommendation engines are nothing but an automated form of a “shop counter guy”. You ask him for the product. Not only he shows that product, but also the related ones which you could buy. They are well trained in cross selling and up selling. So, does our recommendation engines.

The ability of these engines to recommend personalized content, based on past behavior is incredible. It brings customer delight and gives them a reason to keep returning to the website.

In this post, I will cover the fundamentals of creating a recommendation system using GraphLab in Python. We will get some intuition into how recommendation work and create basic popularity model and a collaborative filtering model.

recommendation engine in python, graphlab

Quick Guide to Build a Recommendation Engine in Python

Project to Build your Recommendation Engine

Problem Statement

Many online businesses rely on customer reviews and ratings. Explicit feedback is especially important in the entertainment and ecommerce industry where all customer engagements are impacted by these ratings. Netflix relies on such rating data to power its recommendation engine to provide the best movie and TV series recommendations that are personalized and most relevant to the user.

This practice problem challenges the participants to predict the ratings for jokes given by the users provided the ratings provided by the same users for another set of jokes. This dataset is taken from the famous jester online Joke Recommender system dataset.

Practice Now

Topics Covered

Type of Recommendation Engines
The MovieLens DataSet
A simple popularity model
A Collaborative Filtering Model
Evaluating Recommendation Engines

Before moving forward, I would like to extend my sincere gratitude to the Coursera’s Machine Learning Specialization by University of Washington. This course has been instrumental in my understanding of the concepts and this post is an illustration of my learnings from the same.

1. Type of Recommendation Engines

Before taking a look at the different types of recommendation engines, lets take a step back and see if we can make some intuitive recommendations. Consider the following cases:

Case 1: Recommend the most popular items

A simple approach could be to recommend the items which are liked by most number of users. This is a blazing fast and dirty approach and thus has a major drawback. The things is, there is no personalization involved with this approach.

Basically the most popular items would be same for each user since popularity is defined on the entire user pool. So everybody will see the same results. It sounds like, ‘a website recommends you to buy microwave just because it’s been liked by other users and doesn’t care if you are even interested in buying or not’.

Surprisingly, such approach still works in places like news portals. Whenever you login to say bbcnews, you’ll see a column of “Popular News” which is subdivided into sections and the most read articles of each sections are displayed. This approach can work in this case because:

There is division by section so user can look at the section of his interest.
At a time there are only a few hot topics and there is a high chance that a user wants to read the news which is being read by most others

Case 2: Using a classifier to make recommendation

We already know lots of classification algorithms. Let’s see how we can use the same technique to make recommendations. Classifiers are parametric solutions so we just need to define some parameters (features) of the user and the item. The outcome can be 1 if the user likes it or 0 otherwise. This might work out in some cases because of following advantages:

Incorporates personalization
It can work even if the user’s past history is short or not available

But has some major drawbacks as well because of which it is not used much in practice:

The features might actually not be available or even if they are, they may not be sufficient to make a good classifier
As the number of users and items grow, making a good classifier will become exponentially difficult

Case 3: Recommendation Algorithms

Now lets come to the special class of algorithms which are tailor-made for solving the recommendation problem. There are typically two types of algorithms – Content Based and Collaborative Filtering. You should refer to our previous article to get a complete sense of how they work. I’ll give a short recap here.

Content based algorithms:
- Idea: If you like an item then you will also like a “similar” item
- Based on similarity of the items being recommended
- It generally works well when its easy to determine the context/properties of each item. For instance when we are recommending the same kind of item like a movie recommendation or song recommendation.
Collaborative filtering algorithms:
- Idea: If a person A likes item 1, 2, 3 and B like 2,3,4 then they have similar interests and A should like item 4 and B should like item 1.
- This algorithm is entirely based on the past behavior and not on the context. This makes it one of the most commonly used algorithm as it is not dependent on any additional information.
- For instance: product recommendations by e-commerce player like Amazon and merchant recommendations by banks like American Express.
- Further, there are several types of collaborative filtering algorithms :
  1. User-User Collaborative filtering: Here we find look alike customers (based on similarity) and offer products which first customer’s look alike has chosen in past. This algorithm is very effective but takes a lot of time and resources. It requires to compute every customer pair information which takes time. Therefore, for big base platforms, this algorithm is hard to implement without a very strong parallelizable system.
  2. Item-Item Collaborative filtering: It is quite similar to previous algorithm, but instead of finding customer look alike, we try finding item look alike. Once we have item look alike matrix, we can easily recommend alike items to customer who have purchased any item from the store. This algorithm is far less resource consuming than user-user collaborative filtering. Hence, for a new customer the algorithm takes far lesser time than user-user collaborate as we don’t need all similarity scores between customers. And with fixed number of products, product-product look alike matrix is fixed over time.
  3. Other simpler algorithms: There are other approaches like market basket analysis, which generally do not have high predictive power than the algorithms described above.

2. The MovieLens DataSet

We will be using the MovieLens dataset for this purpose. It has been collected by the GroupLens Research Project at the University of Minnesota. MovieLens 100K dataset can be downloaded from here. It consists of:

100,000 ratings (1-5) from 943 users on 1682 movies.
Each user has rated at least 20 movies.
Simple demographic info for the users (age, gender, occupation, zip)
Genre information of movies

Lets load this data into Python. There are many files in the ml-100k.zip file which we can use. Lets load the three most importance files to get a sense of the data. I also recommend you to read the readme document which gives a lot of information about the difference files.

Now lets take a peak into the content of each file to understand them better.

Users

print users.shape
users.head()

This reconfirms that there are 943 users and we have 5 features for each namely their unique ID, age, gender, occupation and the zip code they are living in.

Ratings

print ratings.shape
ratings.head()

This confirms that there are 100K ratings for different user and movie combinations. Also notice that each rating has a timestamp associated with it.

Items

print items.shape
items.head()

This dataset contains attributes of the 1682 movies. There are 24 columns out of which 19 specify the genre of a particular movie. The last 19 columns are for each genre and a value of 1 denotes movie belongs to that genre and 0 otherwise.

Now we have to divide the ratings data set into test and train data for making models. Luckily GroupLens provides pre-divided data wherein the test data has 10 ratings for each user, i.e. 9430 rows in total. Lets load that:

r_cols = ['user_id', 'movie_id', 'rating', 'unix_timestamp']
ratings_base = pd.read_csv('ml-100k/ua.base', sep='\t', names=r_cols, encoding='latin-1')
ratings_test = pd.read_csv('ml-100k/ua.test', sep='\t', names=r_cols, encoding='latin-1')
ratings_base.shape, ratings_test.shape

Output: ((90570, 4), (9430, 4))

Since we’ll be using GraphLab, lets convert these in SFrames.

import graphlab
train_data = graphlab.SFrame(ratings_base)
test_data = graphlab.SFrame(ratings_test)

We can use this data for training and testing. Now that we have gathered all the data available. Note that here we have user behaviour as well as attributes of the users and movies. So we can make content based as well as collaborative filtering algorithms.

3. A Simple Popularity Model

Lets start with making a popularity based model, i.e. the one where all the users have same recommendation based on the most popular choices. We’ll use the graphlab recommender functions popularity_recommender for this.

We can train a recommendation as:

popularity_model = graphlab.popularity_recommender.create(train_data, user_id='user_id', item_id='movie_id', target='rating')

Arguments:

train_data: the SFrame which contains the required data
user_id: the column name which represents each user ID
item_id: the column name which represents each item to be recommended
target: the column name representing scores/ratings given by the user

Lets use this model to make top 5 recommendations for first 5 users and see what comes out:

#Get recommendations for first 5 users and print them
#users = range(1,6) specifies user ID of first 5 users
#k=5 specifies top 5 recommendations to be given
popularity_recomm = popularity_model.recommend(users=range(1,6),k=5)
popularity_recomm.print_rows(num_rows=25)

Did you notice something? The recommendations for all users are same – 1500,1201,1189,1122,814 in the same order. This can be verified by checking the movies with highest mean recommendations in our ratings_base data set:

ratings_base.groupby(by='movie_id')['rating'].mean().sort_values(ascending=False).head(20)

This confirms that all the recommended movies have an average rating of 5, i.e. all the users who watched the movie gave a top rating. Thus we can see that our popularity system works as expected. But it is good enough? We’ll analyze it in detail later.

4. A Collaborative Filtering Model

Lets start by understanding the basics of a collaborative filtering algorithm. The core idea works in 2 steps:

Find similar items by using a similarity metric
For a user, recommend the items most similar to the items (s)he already likes

To give you a high level overview, this is done by making an item-item matrix in which we keep a record of the pair of items which were rated together.

In this case, an item is a movie. Once we have the matrix, we use it to determine the best recommendations for a user based on the movies he has already rated. Note that there a few more things to take care in actual implementation which would require deeper mathematical introspection, which I’ll skip for now.

I would just like to mention that there are 3 types of item similarity metrics supported by graphlab. These are:

Jaccard Similarity:
- Similarity is based on the number of users which have rated item A and B divided by the number of users who have rated either A or B
- It is typically used where we don’t have a numeric rating but just a boolean value like a product being bought or an add being clicked
Cosine Similarity:
- Similarity is the cosine of the angle between the 2 vectors of the item vectors of A and B
- Closer the vectors, smaller will be the angle and larger the cosine
Pearson Similarity
- Similarity is the pearson coefficient between the two vectors.

Lets create a model based on item similarity as follow:

#Train Model
item_sim_model = graphlab.item_similarity_recommender.create(train_data, user_id='user_id', item_id='movie_id', target='rating', similarity_type='pearson')

#Make Recommendations:
item_sim_recomm = item_sim_model.recommend(users=range(1,6),k=5)
item_sim_recomm.print_rows(num_rows=25)

Here we can see that the recommendations are different for each user. So, personalization exists. But how good is this model? We need some means of evaluating a recommendation engine. Lets focus on that in the next section.

5. Evaluating Recommendation Engines

For evaluating recommendation engines, we can use the concept of precision-recall. You must be familiar with this in terms of classification and the idea is very similar. Let me define them in terms of recommendations.

Recall:
- What ratio of items that a user likes were actually recommended.
- If a user likes say 5 items and the recommendation decided to show 3 of them, then the recall is 0.6
Precision
- Out of all the recommended items, how many the user actually liked?
- If 5 items were recommended to the user out of which he liked say 4 of them, then precision is 0.8

Now if we think about recall, how can we maximize it? If we simply recommend all the items, they will definitely cover the items which the user likes. So we have 100% recall! But think about precision for a second. If we recommend say 1000 items and user like only say 10 of them then precision is 0.1%. This is really low. Our aim is to maximize both precision and recall.

An idea recommender system is the one which only recommends the items which user likes. So in this case precision=recall=1. This is an optimal recommender and we should try and get as close as possible.

Lets compare both the models we have built till now based on precision-recall characteristics:

model_performance = graphlab.compare(test_data, [popularity_model, item_sim_model])
graphlab.show_comparison(model_performance,[popularity_model, item_sim_model])

Here we can make 2 very quick observations:

The item similarity model is definitely better than the popularity model (by atleast 10x)
On an absolute level, even the item similarity model appears to have a poor performance. It is far from being a useful recommendation system.

Now let us learn to build a recommendation engine in R

Implementation in R

Step 1: Importing the data files

Step 2: Validating the imported data files

Output

#Validating user files

[1] 943 5

user_id age sex occupation zip_code
1 1 24 M technician 85711
2 2 53 F other 94043
3 3 23 M writer 32067
4 4 24 M technician 43537
5 5 33 F other 15213
6 6 42 M executive 98101

#Validating ratings files

[1] 100000 4

user_id movie_id rating unix_timestamp
1 196 242 3 881250949
2 186 302 3 891717742
3 22 377 1 878887116
4 244 51 2 880606923
5 166 346 1 886397596
6 298 474 4 884182806

#Validating items files

[1] 1682 24

Step 3: Loading the train and test dataset

Step 4: Validating the test and train dataset

Output

#Validating train files

[1] 90570 4

user_id movie_id rating unix_timestamp
1 1 1 5 874965758
2 1 2 3 876893171
3 1 3 4 878542960
4 1 4 3 876893119
5 1 5 3 889751712
6 1 6 5 887431973

 #Validating test files

[1] 9430 4

user_id movie_id rating unix_timestamp
1 1 20 4 887431883
2 1 33 4 878542699
3 1 61 4 878542420
4 1 117 3 874965739
5 1 155 2 878542201
6 1 160 4 875072547

Step 5 Building a simple Popularity Model

The movies with the highest mean recommendations in our data_train data set:

Output

 movie_id rating

814 5.000000
1122 5.000000
1189 5.000000
1201 5.000000
1293 5.000000
1467 5.000000
1500 5.000000
1536 5.000000
1599 5.000000
1656 5.000000
1449 4.714286
1398 4.500000
1463 4.500000
1594 4.500000
1642 4.500000
114 4.491525
408 4.480769
169 4.476636
318 4.475836
483 4.459821

All the recommended movies have an average rating of 5, i.e. all the users who watched the movie gave a top rating. Thus we can see that our popularity system works as expected.

Step 6 Building a collaborating filtering model

Let’s create a model based on item similarity as follow:

model1
Recommender of type ‘IBCF’ for ‘binaryRatingMatrix’ 
learned using 90570 users.

predicted1
Recommendations as ‘topNList’ with n = 10 for 9430 users. 

head(reccom_list,25)
user_id rating movie_id
1 0.5085274 10
1 0.5000725 5
1 0.5035601 3
1 0.5051816 3
1 0.5121527 4
1 0.5000000 8
1 0.5009827 2
1 0.5035601 12
2 0.5036726 3
2 0.5000000 3
2 0.5017065 2
2 0.5019525 5
2 0.5060604 11
2 0.5000000 12
2 0.5002561 7
2 0.5016215 8
3 0.5014799 5
3 0.5124669 4
3 0.5014799 5
3 0.5009827 9
3 0.5052220 1
3 0.5060604 11
3 0.5000000 6
3 0.5009737 1
4 0.5014341 6

Step 7 – Evaluating Recommendation Engines

Let’s compare both the models we have built till now based on precision-recall characteristics:

Observations

The item similarity model is definitely better than the popularity model (by at least 10x)
On an absolute level, even the item similarity model appears to have a poor performance. It is far from being a useful recommendation system.

There is a big scope of improvement here. But I leave it up to you to figure out how to improve this further. I would like to give a couple of tips:

Try leveraging the additional context information which we have
Explore more sophisticated algorithms like matrix factorization

In the end, I would like to mention that along with GraphLab, you can also use some other open source python packages like the following:

Projects

Now, its time to take the plunge and actually play with some other real datasets. So are you ready to take on the challenge? Accelerate your journey and use recommendation engines to solve these Practice Problems:

	Online Challenge: Build A Recommendation Engine	Recommend the next items customers are most likely to buy
	Practice Problem: Recommendation Engine	Predict range of attempts a user will make to solve a given problem
	Practice Problem: Is this joke funny?	Predict the rating given by users to different jokes

End Notes

In this article, we traversed through the process of making a basic recommendation engine in Python using GrpahLab. We started by understanding the fundamentals of recommendations. Then we went on to load the MovieLens 100K data set for the purpose of experimentation.

Subsequently we made a first model as a simple popularity model in which the most popular movies were recommended for each user. Since this lacked personalization, we made another model based on collaborative filtering and observed the impact of personalization.

Finally, we discussed precision-recall as evaluation metrics for recommendation systems and on comparison found the collaborative filtering model to be more than 10x better than the popularity model.

Did you like reading this article ? Do share your experience / suggestions in the comments section below.

You can test your skills and knowledge. Check out Live Competitions and compete with best Data Scientists from all over the world.

Aarshay Jain 29 Jul 2022

Aarshay graduated from MS in Data Science at Columbia University in 2017 and is currently an ML Engineer at Spotify New York. He works at an intersection or applied research and engineering while designing ML solutions to move product metrics in the required direction. He specializes in designing ML system architecture, developing offline models and deploying them in production for both batch and real time prediction use cases.

Data Science Intermediate Machine Learning Project Python

Responses From Readers

Vinay Jain 02 Jun, 2016

I am completly new to the field of data science. I have started courses of Machine Learning. Can you please suggest me how to proceed or should I consider some other options. I plan to proceed further in the field of AI. Please suggest how should i carry on or begin my journey. ?

Show 2 reply

Aarshay Jain 02 Jun, 2016

Thanks for reaching out. I'm sorry but there is no fixed answer to your question and this thread is probably not the right place to answer. I recommend reading similar discussions on http://discuss.analyticsvidhya.com and you can start a new thread as well. You can also check out the learning paths on our website if you're interested in a particular tool. Hope this helps.

r4sn4 05 Jun, 2016

In this blog, data is already divided into train and test. But ,how to divide data into Train and Test? On what basis to make this decision?

Ericgits 02 Jun, 2016

Good article , very educative

Show 1 reply

Aarshay Jain 02 Jun, 2016

Thanks you!

Hulisani 02 Jun, 2016

Thanks for sharing such an Amazing article, can I please have in pdf.

Show 1 reply

Aarshay Jain 02 Jun, 2016

Thanks Hulisani! Unfortunately, we don't have proper pdf formats. Generally what I do is print the page and save it as a pdf. I won't look that good but mostly works.

Naveed1228 02 Jun, 2016

Nice Article.How can we use this article on website using Flask or Django.I am newbie if you suggest me some resources.I will thankful to you.

Show 1 reply

Aarshay Jain 02 Jun, 2016

I think GraphLab has deployment facility which you can use. I'm not sure how exactly that works. Another option could be to make a RestFul API of your recommender and call it on the website.

Ron Williams 02 Jun, 2016

Hi, thanks for the article, which is, in itself very useful. I just wanted to highlight what I see as a limitation of 'recommender' systems, such as google's or YouTube's. Especially YT. The nub of my objection to them is that they either recommend things I've already seen, or very similar things. In a lot of cases I've basically already moved on from that, and really don't want more of the same. E.g. suppose I accidentally watch a 'Game of Thrones' clip; YT will ad nauseam present more of them, and most especially what irks me is that they present the exact same clip, but uploaded by a different user. grrr... Or, say I look at 'How to Create an Amazon Affiliate Site in 25.3 seconds and make SQUILLIONS' - yep, you guessed it, I get the ones on how to do that zact-same thing in 16.3 seconds and make $500, or $344,253.45 or ... And with google - say I've been looking for products for the abovementioned Affiliate site. Google will have picked up on that and will most irritatingly present me, for literally days with those products, which I have absolutely no intention of buying. Another annoying 'feature' is, suppose I've already bought a product. Google will assume I'm interested in such things and present me with opportunities to buy even more of the thing that I already have one of, and really, really don't need any more. I've got one. Thanks anyhow. Recommender systems have a looong way to go, to be actually useful as marketing tools, as opposed to irritants.

Show 1 reply

Aarshay Jain 02 Jun, 2016

Thanks for sharing your thoughts. I agree with you totally. But I think its a good things. We can an untapped potential and this gives a perfect opportunity to explore this further and design better systems. I think one potential reason causing this could be that Google and say Amazon talk to each other only superficially. So google might get the info that a user is searching for a product but it might not have info about whether the product is already bought. This is just my assumption and I don't know how that system actually works. I will explore this further for sure!

basel 02 Jun, 2016

how can someone build a very big recommender system with like netflix , i mean what is the way for that?

Show 1 reply

Aarshay Jain 02 Jun, 2016

I belive GraphLab is a scalable tool. You can use it on large datasets as well. But I'm not sure whether it'll be possible on our PC or a GraphLab server is required. You can contact GraphLab directly for this.

james 02 Jun, 2016

I am a R person. Are there similar codes in R instead of Python ?

Show 1 reply

Aarshay Jain 02 Jun, 2016

Yes there are 2 more articles on recommendation engines based on R: - http://www.analyticsvidhya.com/blog/2016/03/exploring-building-banks-recommendation-system/ - http://www.analyticsvidhya.com/blog/2015/10/recommendation-engines/

Ash 06 Jun, 2016

Hey! I am developing a model similar to the one in this link: http://www.salemmarafi.com/code/collaborative-filtering-with-python/. How do you think I should evaluate this model? Any suggestions please?

Show 1 reply

Aarshay Jain 13 Jun, 2016

have you tried the precision-recall technique which I've explained here?

manish 13 Jun, 2016

I am having Trouble while importing 'graphlab', when I import and run I am getting the following output: --------------------------------------------------- File "", line 32, in import graphlab as gl ImportError: No module named 'graphlab' --------------------------------------------------- I have googled out things but am unable to find any specific solution on this, I am using Windows and using Anaconda - Spyder !!!

Show 1 reply

Aarshay Jain 13 Jun, 2016

You need to install graphlab first. The first year license is free.

Kaustubh Sakhalkar 14 Jun, 2016

Hi Aarshay excellent article! helped me immensely with a project I am working on. Keep them coming!

Show 1 reply

Aarshay Jain 14 Jun, 2016

Sure stay tuned!

Apurva 03 Nov, 2016

Good article Aarshay. I landed on this when exploring some other stuff.

Show 1 reply

Aarshay Jain 16 Dec, 2016

Thanks Apurva!

Nicolas Hug 19 Nov, 2016

Hi! Nice article! I saw you mentioned Crab as an alternative to GraphLab, but unfortunately crab insn't maintained anymore. Some other Python recommender system libraries are python-recsys (https://github.com/ocelma/python-recsys, in support mode I suppose), mrec (https://github.com/Mendeley/mrec) and RecSys (https://github.com/Niourf/RecSys -- I am the creator of RecSys).

Show 1 reply

Aarshay Jain 16 Dec, 2016

Hi Nicolas. Thanks for reaching out! I'll add this to the post.

Labeeb Ibrahim 07 Dec, 2016

graphlab doesn't work for Python3. Do you suggest a substitute?

Show 1 reply

Aarshay Jain 16 Dec, 2016

I suggested crab as an alternative. But see the comment from Nicolas Hug for further details.

Tejas 26 May, 2017

Hi Aarshay, I'm stuck right at the 4th Step. I'm unable to I'm getting repeatative "movie ids" as recommendation to visible 1st 5 users. Also there's a slight irregularity in result of Popularity Model, the first movie_id I get is 1599 whereas in your screenshots it's 1500, could be because someone must have updated Movielens Dataset, yet concerns me since the output of "Ratings_base" is word to word correct. popularity_recomm.print_rows(num_rows=4) +---------+----------+-------+------+ | user_id | movie_id | score | rank | +---------+----------+-------+------+ | 1 | 1599 | 5.0 | 1 | | 1 | 1201 | 5.0 | 2 | | 1 | 1189 | 5.0 | 3 | | 1 | 1122 | 5.0 | 4 | print(ratings_base.groupby(by='movie_id')['rating'].mean().sort_values(ascending=False).head(20)) movie_id 1500 5.000000 1293 5.000000 1122 5.000000 1189 5.000000 1656 5.000000 1201 5.000000 1599 5.000000 814 5.000000 1467 5.000000 1536 5.000000 1449 4.714286 1642 4.500000 1463 4.500000 1594 4.500000 1398 4.500000 114 4.491525 408 4.480769 169 4.476636 318 4.475836 483 4.459821 Name: rating, dtype: float64 and last thing is, the foll. statement produces the recommendation of same series for every user. item_sim_recomm.print_rows(num_rows=10) your help is appreciated.

Sachin Gaikwad 22 Jun, 2017

Hi Aarshay, Sachin is here, I need to apply content based algorithm on movieLens dataset. I have facing following problem:- 1. How to create item profile: Which ML technique we should use to create item profile.

Felix Andrew 22 Jun, 2017

When I ran the example, I got exactly the same result for both Popularity Model and Collaborative Filtering (Similarity Recommender ('pearson')) . I did not modify the code or data in any way Do you know why?

Atul Varshney 29 Jun, 2017

Hey, i want to set my own user id such as 1205 in user_id column. How can i do that. Please Help me out. Thanks in advance

Sanjana 18 Sep, 2017

File "rs.py", line 1, in import pandas as pd ModuleNotFoundError: No module named 'pandas' I am getting this even after installing pandas using pip. Any solution? Great article:-)

Aman Middha 18 Sep, 2017

Is there any other method to work on a recommender system without using GraphLab? I want to use something that is not so expensive from a long term view.

Siddhant Saxena 03 Dec, 2017

Hey Aarshay excellent article!! I was wondering if there is a method in graphlab for User-User based similarity calculation like 'item_similarity_recommender.create' ? if not do you have suggestions to calculate user based similarity?

satya 05 Dec, 2017

Hi Aarshay, Very good and easy to understand article. What should i do if i have more than one selection criteria(in your example its only rating) ??

Asanka Dissanayake 13 Dec, 2017

If you can please give some data-set , which is suitable for the News Recommended System application. That will very help full to me.

Ankur 18 Dec, 2017

Hi Aarshay, thank you for this wonderful article. It will be really wonderful if you could please publish an article on ALS algorithm for Recommendation.

Ashwini Kumar 27 Dec, 2017

Hi Aarshay, many thanks on another well written article. I am working on a recommendation problem and trying to identify packages like GraphLab. From the discussion above, it is not very clear whether GraphLab or Crab is compatible with a Mac based Python3 environment. Could you please suggest current packages..preferably open source that we can use? Alternatively, are there any algos in Scikit-learn that can be used to model the recommendation problems?

Aditya Rathore 30 Dec, 2017

Could you please send me a link where I can get E-commerce data to build a recommendation system. The data must not be too large,

Kush Shrivastava 02 Jan, 2018

Thank You, Sir, for the article. I am a Ph.D. scholar and working on link prediction on a recommendation system. So can you please suggest me how should I proceed further. Thankyou in advance.

Gauri 10 Jan, 2018

Great article! In this example, we had limited number of attributes, i.e. 24 genres. What is your recommendation for situations where we have large number of attributes, and each attribute could have many values? e.g. An algorithm to recommend specific products based on the items purchased by the consumer on a website that sells bags. Every bag has multiple attributes viz. color, price, material, manufacturer, style, etc.

kumarmanishjha 19 Jan, 2018

I have just a question which might be a trivial one. How are content-based filtering and item-item collaborative filtering different? Basically in both the cases, we are trying to recommend similar items to the user. TIA

Lasitha Randunu 24 Jan, 2018

I'm currently doing my final year research project to implement a personalized recommender system. i'm still at the very beginning and love to have your advises in order to improve my project. i really enjoyed this article and i used this codes to at my interim presentation to explain the evaluations between above algorithms. graph api's graphical representation was very helpful. my question is what is this cutoff value and how it is relate to the performance evaluation? waiting for your reply. thanks in advance!

Deep Mendha 01 Feb, 2018

Hello , if i want to change the evaluation criteria from precision-recall to other like confusion matrix or accuracy. How can i do it using graphlab.

nithish 01 Feb, 2018

Great article, Can share ideas about how to best way to deploy recommendation system on production level.

Jayakrishna 10 Feb, 2018

Hi, Great article. I would like to know, what are the other possible ways to do recommendation engine? is there any freeware to do or GraphLab is the only way? Thanks in advance.

Tom 21 Feb, 2018

Hi.;i am trying to build a personalized recommendation system for supermarkets using association mining.I will have a mobile app where customers can receive personalized notification of adverts and a website where administrators can insert the advert.I am using Weka and a supermarket product dataset. When i did run the association mining i got a set of rules.However i don't know what to do now as every customer will have a different rule.The admin of the website will add an advert for a product....So how do i use the association mining for each customer that must receive a personalized advertizing notification. E.g Suppose the admin inserts an advert about milk. How must i write the algorithm to make sure that only customers that satisfy a certain rule will buy the milk?

Tejas Sharma 16 Apr, 2018

A very helpful guide to recommendation systems! I wish to implement a recommendation system using collaborative filtering in SWIFT for iOS. Is there a way that I can combine Python and SWIFT together since Machine Learning APIs like the similarity metrics or GraphLab are easily available in Python? I want to develop a small-scale system that trains itself as new data comes in. What would be the best approach to this? Thanks for the help.

Quick Guide to Build a Recommendation Engine in Python & R

Overview

Introduction

Project to Build your Recommendation Engine

Problem Statement

Topics Covered

1. Type of Recommendation Engines

Case 1: Recommend the most popular items

Case 2: Using a classifier to make recommendation

Case 3: Recommendation Algorithms

2. The MovieLens DataSet

Users

Ratings

Items

3. A Simple Popularity Model

4. A Collaborative Filtering Model

5. Evaluating Recommendation Engines

Implementation in R

Projects

End Notes

You can test your skills and knowledge. Check out Live Competitions and compete with best Data Scientists from all over the world.

Frequently Asked Questions

Responses From Readers

Write for us

Machine Learning

Quick Guide to Build a Recommendation Engine in Python & R

Overview

Introduction

Project to Build your Recommendation Engine

Problem Statement

Topics Covered

1. Type of Recommendation Engines

Case 1: Recommend the most popular items

Case 2: Using a classifier to make recommendation

Case 3: Recommendation Algorithms

2. The MovieLens DataSet

Users

Ratings

Items

3. A Simple Popularity Model

4. A Collaborative Filtering Model

5. Evaluating Recommendation Engines

Implementation in R

Projects

End Notes

You can test your skills and knowledge. Check out Live Competitions and compete with best Data Scientists from all over the world.

Frequently Asked Questions

Responses From Readers

Write for us

Machine Learning

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

NaÃ¯ve Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices