Using python's networkX to compute personalize

2019-04-29 17:57发布

问题:

I am trying to build a directed graph and compute personalized page rank over this graph. So suppose I have a graph with vertices {1,2,3,4} and edges going from 2, 3, and 4 to vertex 1, I would like to:

(1) compute the personalized page rank of every vertex with respect to 1

(2) compute the personalized page rank of every vertex with respect to 2.

The question is how I should pass this option in the personalized page rank function. The following code does not seem to do what I want:

import networkx as nx

G = nx.DiGraph()

[G.add_node(k) for k in [1,2,3,4]]
G.add_edge(2,1)
G.add_edge(3,1)
G.add_edge(4,1)


ppr1 = nx.pagerank(G,personalization={1:1, 2:1, 3:1, 4:1})
ppr2 = nx.pagerank(G,personalization={1:2, 2:2, 3:2, 4:2})

Right now ppr1 == ppr2, even though it should not be the case.

================================================================== UPDATE.

In response to comment below, my understanding of personalized page rank comes from the following:

An equivalent definition is in terms of the terminal node of a random walk starting from s. Let (X0, X1, . . . , XL) be a random walk starting from X0 = s of length L ∼ Geometric(α). Here by L ∼ Geometric(α) we mean Pr[L = ] = (1−α) α. This walk starts at s and does the following at each step: with probability α, terminate; and with the remaining probability 1 − α, continue to a random out-neighbor of the current node. Here if the current node is u, the random neighbor v ∈ N out(u) is chosen with probability wu,v if the graph is weighted or with uniform probability 1/dout(u) if the graph is unweighted. Then the PPR of any node t is the probability that this walk stops at t:

Found on page 6 of this thesis: https://cs.stanford.edu/people/plofgren/bidirectional_ppr_thesis.pdf

So I suppose what I am looking for when computing "the personalized page rank of t with respect to s" is if we start a random walk from s according to the process described above, what is the probability that this walk terminates at t.

回答1:

In the conceptualization of PageRank, a random surfer is moving around following links. At each step there is a nonzero probability the surfer goes to a random page (as opposed to following a link). If the choice of that random page is weighted, then it is referred to as personalized PageRank.

In your case you want that jump to be to a single specific page. So you need to tell it that all the other pages have zero probability of being selected in those steps when the surfer jumps rather than following an edge.

ppr1 = nx.pagerank(G,personalization={1:1, 2:0, 3:0, 4:0})
ppr2 = nx.pagerank(G,personalization={1:0, 2:1, 3:0, 4:0})