Pagerank and its mathematics: Explanation needed

I am a student interested in developing a search engine that indexes pages from my country. I have been researching algorithms to use for sometime now and I have identified HITS and PageRank as the best out there. I have decided to go with PageRank since it is more stable than the HITS algorithm (or so I have read).

I have found countless articles and academic papers related to PageRank, but my problem is that I don't understand most of the mathematical symbols that form the algorithm in these papers. Specifically, I don't understand how the Google Matrix (the irreducible,stochastic matrix) is calculated.

My understanding is based on these two articles:

Could someone provide a basic explanation (examples would be nice) with less mathematical symbols?

Thanks in advance.

标签： algorithm math search pagerank

7条回答

男人必须洒脱

2楼-- · 2019-01-30 12:39

If you want to learn more about page rank with less math, then this is very good tutorial on basic matrix operations. I recommend it for everyone who has little math background but wants to dive into ranking algorithms.

0人赞添加讨论(0) 举报

老娘就宠你

3楼-- · 2019-01-30 12:40

This is the paper that you need: http://infolab.stanford.edu/~backrub/google.html (If you do not recognise the names of the authors, you will find more information about them here: http://www.google.com/corporate/execs.html).

The symbols used in the document, are described in the document in lay English.

Thanks for making me google this.

0人赞添加讨论(0) 举报

不美不萌又怎样

4楼-- · 2019-01-30 12:43

"The $25,000,000,000 Eigenvector: The Linear Algebra Behind Google". from Rose-Hulman is a bit out of date, because now Page Rank is the $491B linear algebra problem. I think the paper is very well written.

"Programming Collective Intelligence" has a nice discussion of Page Rank as well.

0人赞添加讨论(0) 举报

干净又极端

5楼-- · 2019-01-30 12:54

Duffymo posted the best refernce in my opinion. I studied the page rank algorithm in my senior undergrad year. Page rank is doing the following:

Define the set of current webpages as the states of a finite markov chain.
Define the probability of transitioning from site u to v where the there is an outgoing link to v from u to be

1/u_{n} where u_{n} is the number of out going links from u.
Assume the markov chain defined above is irreducible (this can be enforced with only a slight degradation of the results)
It can be shown every finite irreducible markov chain has a stationary distribution. Define the page rank to be the stationary distribution, that is to say the vector that holds the probability of a random particle to end up at each given site as the number of state transitions goes to infinity.

Google uses a slight variation on the power method to find the stationary distribution (the power method finds dominant eigenvalues). Other than that there is nothing to it. Its rather simple and elegant and probably one of the simplest applications of markov chains I can think of, but it is wortha lot of money!

So all the pagerank algorithm does is take into account the topology of the web as an indication of whether a website should be important. The more incoming links a site has the greater the probability of a random particle spending its time at the site over an infinite amount of time.

0人赞添加讨论(0) 举报

唯我独甜

6楼-- · 2019-01-30 12:55

You might also want to read the introductory tutorial on the mathematics behind the construction of the Pagerank matrix written by David Austin's entitled How Google Finds Your Needle in the Web's Haystack; it starts with a simple example and builds to the full definition.

0人赞添加讨论(0) 举报

The star\"

7楼-- · 2019-01-30 12:57

If you are serious about developing an algorithm for a search engine, I'd seriously recommend you take a Linear Algebra course. In the absence of an in-person course, the MIT OCW course by Gilbert Strang is quite good (video lectures at http://ocw.mit.edu/OcwWeb/Mathematics/18-06Spring-2005/VideoLectures/).

A class like this would certainly allow you to understand the mathematical symbols in the document you provide-- there's nothing in that paper that wouldn't be covered in a first-year Linear Algebra course.

I know this isn't the answer you are looking for, but it's really the best option for you. Having someone try to explain the individual symbols or algorithms to you when you don't have a good grasp of the basic concepts isn't a very good use of anybody's time.

0人赞添加讨论(0) 举报

1 2 下一页

Pagerank and its mathematics: Explanation needed

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间