After looking at the impressive performance comparison, I decided that I would give a try to graph-tool. So for comparison, I wrote codes to generate a random tree using both packages.
The graph-tool code:
import numpy as np
import graph_tool.all as gt
# construct an initial graph with two nodes and one link
n = 5000
G = gt.Graph(directed = False)
G.add_edge(0, 1)
for t in range(2, n):
# connect the new vertex to one of the old vertices randomly
G.add_edge(np.random.choice(range(t)), t)
The Networkx code:
import networkx as nx
import numpy as np
n = 5000
# initial graph
G = nx.Graph()
G.add_edge(0, 1)
for t in range(2, n):
G.add_edge(t, np.random.choice(range(t)))
The graph-tool takes around 14 seconds on my 4-core machine whereas networkx takes less than 2 seconds on the same machine! Am I missing something obvious?
Thanks in advance.
There is nothing surprising here. graph-tool achieves greater performance by off-loading main loops to C++. If all your main loops are in Python, it offers no advantage. The same is true for other libraries like numpy.
The proper way to achieve fast addition of edges is to let graph-tool perform the main loop. The network you are generating is a simple growth model, and can be achieved in graph-tool by calling:
G = price_network(n, gamma=0, directed=False)
which takes about 15 ms in my computer for n=5000.
Note also that your python code is unnecessarily slow, since you create new lists with all vertices at each iteration. A much faster version would be:
from numpy.random import randint
n = 5000
G = Graph(directed=False)
G.add_vertex(n)
G.add_edge(0, 1)
for i in range(2, n):
G.add_edge(i, randint(i))
For even larger values of n, it will be even faster to add all edges at once instead of one by one, i.e.
from graph_tool.all import *
from numpy.random import randint
n = 5000
G = Graph(directed=False)
edges = [(0, 1)]
for i in range(2, n):
edges.append((i, randint(i)))
G.add_edge_list(edges)