Imagine you’re launching a brand-new search engine and need to decide which pages deserve top billing. You’d probably start by counting keywords, but that only tells you so much. Instead, what if you judged a page by how many other reputable sites link to it? That’s the core of PageRank! The web becomes a giant voting network, where each hyperlink casts a vote of confidence. A damping factor keeps this process from spiraling, simulating a “random surfer” who occasionally jumps to a fresh page. In this article, we’ll unpack how PageRank turns link patterns into a powerful ranking signal, and conclude it by elucidating the algorithm that powers it.
PageRank is an algorithm developed by Google to rank web pages in search results by measuring the importance of each page based on the number and quality of links it receives. PageRank scores pages by treating every hyperlink as a “vote” of confidence. It not only counts votes but also weighs them by each voter’s own PageRank, so links from important sites matter more. A damping factor (usually 0.85) models a “random surfer” who occasionally jumps to any page, ensuring the scores converge. By combining link quality and quantity, PageRank pushes genuinely authoritative pages to the top, solving the keyword-only bias.
A user enters the query: “Harvard Business School“. The expectation is for the first link to be “http://www.harvard.edu/”. But what would your algorithm do? It would try to find out pages which have the word “Harvard” a maximum number of times, as “Business” and “School” will come out to be common words. Now, there is a possibility that the “Harvard” keyword might not be repeated multiple times on Harvard’s website. However, websites like Business School Consultants or articles on business school might have this keyword multiple times. This leads these websites to achieve a rank much higher than the actual business school website. This would’ve been the case if the keyword-only algorithm is used standalone. Thanks to the Google PageRank algorithm, which takes into account not only the frequency of keywords but also the quality and quantity of links to a website, such a problem is obviated.
Imagine a web with only four pages that are linked to each other. Each box below represents a web page. The words in black and italics are the links between pages.
For instance, the web page “Tavish” has three outgoing links to the other three web pages. Now, let’s draw a simpler directed graph of this ecosystem.
Here is how Google ranks a page: The page with the maximum number of incoming links is the most important page. In the current example, the “Kunal Jain” page comes out as the most significant page.
The first step of the formulation is to build a direction matrix. This matrix will have each cell as the proportion of the outflow. For instance, Tavish (TS) has 3 outgoing links, which makes each proportion as 1/3.
Now we imagine that if there were a bot that followed all the outgoing links, what would be the total time spent by this bot on each of these pages? This can be broken down mathematically into the following equation:
A * X = X
Here, A is the proportion matrix mentioned above
X is the probability of the bot being on each of these pages
Clearly, Kunal Jain‘s page in this universe is most important, which goes in the same direction as our intuition.
Now, imagine a scenario where we have only 2 web pages: A and B. A has a link to B, but B has no external links. In such cases, if you try solving the matrix, you will get a zero matrix. This looks unreasonable as B looks to be more important than A. But our algorithm still gives the same importance to both. To solve this problem, a new concept of teleportation was introduced. We include a constant probability of alpha to each of these pages. This is to compensate for instances where a user teleports from one webpage to another without any link. Hence, the equation is modified to the following:
(1-alpha) * A * X + alpha * b = X
Here, b is a constant unit column matrix. Alpha is the proportion of teleportation. The most common value taken for alpha is 0.15 (but it can depend on different cases).
Here are some uses of PageRank:
We’ve seen how PageRank works its magic by treating the web like a vast network of streets, passing “importance” from one page to another through links, dampening and redistributing scores until everything settles into a clear ranking. We broke down the core formula, talked through iterations and the damping factor; techniques used to keep the ranking system honest. While its origins lie in search engines, PageRank’s real power shows up wherever connections matter, from social network analysis and recommendation systems to biology and beyond, making it a versatile tool for understanding any linked system.
A. PageRank is a Google algorithm that measures the importance of web pages based on the number and quality of links pointing to them. It assigns a numerical weighting to each page, with higher scores indicating greater importance. This weighting helps Google rank web pages in search results.
A. PageRank, a Google algorithm, measures web page importance based on link quantity and quality. Scores range from 0 to 10, with higher scores indicating greater importance. Aim for a PageRank of 5 or higher to improve search rankings and increase website traffic.
A. Here are a few ways of doing that:
1. Create high-quality, relevant content.
2. Build backlinks from reputable websites.
3. Optimize your website for search engines (SEO).
4. Promote your website on social media and other channels.
5. Ensure technical SEO compliance.
This is very interesting. I am a beginner at R programming but I'd like to know how could you calculate X from the equation using R: A * X = X Same goes for teleportation adjustment equation: (1-alpha) * A * X + alpha * b = X How to calculate X given A, B & alpha ?
Chitranjan, A * X can be solved using a simultaneous equations. Lets say there are only 2 states with A = [[a1 , a2];[a3 , a4]] and X = [x1 , x2] A * X = [a1 x1 + a2 x2 , a3x1 + a4x2] Hence you need to solve, [a1 x1 + a2 x2 , a3x1 + a4x2] = [x1 , x2] =>a1 x1 + a2 x2 = x1 => a3x1 + a4x2 = x2 Here ai are all constants whereas xs are the changing variables. Similar kind of logic can be applied for the alpha equation. Hope this helps. Tavish
Tavish sir Thanks for helping the post. Sir In the above example which you mentioned I understand how you calculate A matrix using total probability is 1 and if has three outbound link so each may have probability 0.33. But I dont understand how do you arrive the figure of [0.40,0.12,0.24,0.24]. Could you please explain this. I am working in digital marketing company so want to know more about the algorithm Please show some link from where I can understand easily about the algorithm. I am waiting for your reply of both the questions.
Santu, This has been answered in my last comment. Let me know in case you still face an issue. Tavish