I'm starting working with graph-tool, importing a list of edges from a pandas dataframe df
like:
node1 node2
0 1 2
1 2 3
2 1 4
3 3 1
4 4 3
5 1 5
So basically a list of directed edges. I'm importing them into graph-tool according to the tutorial with:
from graph_tool.all import *
import pandas as pd
# Read pandas dataframe
df = pd.read_csv('file.csv')
# Define Graph
g = Graph(directed=True)
# Add Edges
g.add_edge_list(df.values)
According to the Documentation of add_edge_list(edge_list): edge_list may be a ndarray of shape (E,2), where E is the number of edges, and each line specifies a (source, target) pair.
Running the above code setting edge_list = df.values, and drawing the graph, I obtained:
which is not a representation of the original edge_list of the dataframe. I tried to set *edge_list* = df.values.tolist()
with:
g.add_edge_list(df.values.tolist())
obtaining:
Which actually is the right one. Anyone can reproduce this? The problem here is that I'm working with huge networks (~4*10^6 nodes), and I think that the .tolist()
method is going to waste a lot of memory in the process.
EDIT: add code for drawing the graph:
graph_draw(g, vertex_text=g.vertex_index, vertex_font_size=18, output_size=(200, 200), output="graph.png")