Does anyone know about the differences in accuracy between the three different pagerank functions in Networkx?
I have a graph of 1000 nodes and 139732 edges, and the "plain" pagerank
function didn't seem to work at all -- all but two of the nodes had the same PG, so I'm assuming this function doesn't work quite as well for large graphs?
pagerank_numpy
's values also seemed to be a little bit more spread out than pagerank_scipy
's values. The documentation for this function says that "This will be the fastest and most accurate for small graphs." What is meant by "small" graphs?
Also, why doesn't pagerank_numpy allow for max_iter
and tol
arguments?
Each of the three functions uses a different approach to solving the same problem:
networkx.pagerank()
is a pure-Python implementation of the power-method to compute the largest eigenvalue/eigenvector or the Google matrix. It has two parameters that control the accuracy -tol
andmax_iter
.networkx.pagerank_scipy()
is a SciPy sparse-matrix implementation of the power-method. It has the same two accuracy parameters.networkx.pagerank_numpy()
is a NumPy (full) matrix implementation that calls thenumpy.linalg.eig()
function to compute the largest eigenvalue and eigenvector. That function is an interface to the LAPACK dgeev function which is uses a matrix decomposition (direct) method with no tunable parameters.All three should produce the same answer (within numerical roundoff) for well-behaved graphs if the
tol
parameter is small enough and themax_iter
parameter is large enough. Which one is faster depends on the size of your graph and how well the power method works on your graph.