Unexpected error reading GML graph

2020-07-13 08:18发布

问题:

I have downloaded the gml file which contains the dolphins social network.

Some time ago I did some analysis on that network running python 3.4 and networkx 1.9 on a a Windows7 machine, but now I am running on a Arch linux machine (with the same version of python but with networkx 1.10) and found an issue when tried to read the file.

This is the code used to read the file:

import networkx as nx
nx.read_gml("dolphins.gml")

And this is the stack trace of the error:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 2, in read_gml
File "/usr/lib/python3.4/site-packages/networkx/utils/decorators.py",line 220, in _open_file
result = func(*new_args, **kwargs)
File "/usr/lib/python3.4/site-packages/networkx/readwrite/gml.py", line 210, in read_gml
G = parse_gml_lines(filter_lines(path), label, destringizer)
File "/usr/lib/python3.4/site-packages/networkx/readwrite/gml.py", line 383, in parse_gml_lines
graph = parse_graph()
File "/usr/lib/python3.4/site-packages/networkx/readwrite/gml.py", line 372, in parse_graph
curr_token, dct = parse_kv(next(tokens))
File "/usr/lib/python3.4/site-packages/networkx/readwrite/gml.py", line 323, in tokenize
(line[pos:], lineno + 1, pos + 1))
networkx.exception.NetworkXError: cannot tokenize 'graph' at (1, 1)

Are you able to read the file? Someone has experienced a simmilar issue? or knows what is generating the error?

Thank you in advance!

回答1:

In the newer versions of networkx, the gml file should follow a more specific format. The problem with the dolphins.gml is that there should not be any carriage return before the open square brackets. For example:

Wrong format:

graph 
[
  directed 0
  node 
  [
    id 0
    label "Beak"
  ]
  .
  .
  .

Correct format:

graph [
  directed 0
  node [
    id 0
    label "Beak"
  ]
  .
  .
  .

It does not care about how many spaces there are before the square bracket as long as there is more than one and there is no carriage return.

What I ended up doing was using regular expression to get rid of the white spaces before the opening square brackets. The following regex worked for me:

\s+\[

and just replace it with " [". There has to be at least one space before the bracket.

Also keep in mind that every node has to have a unique label.

Hope it helped.



回答2:

It worked by downgrading the networkx version from 1.10 to 1.9.1.

Hope this answer can help someone else.