Improving Python NetworkX graph layout

2019-01-21 14:39发布

问题:

I am having some problems in visualizing the graphs created with python-networkx, I want to able to reduce clutter and regulate the distance between the nodes (I have also tried spring_layout, it just lays out the nodes in an elliptical fashion). Please advise.

Parts of code:

nx.draw_networkx_edges(G, pos, edgelist=predges, edge_color='red', arrows=True)
nx.draw_networkx_edges(G, pos, edgelist=black_edges, arrows=False, style='dashed')
# label fonts
nx.draw_networkx_labels(G,pos,font_size=7,font_family='sans-serif')
nx.draw_networkx_edge_labels(G,pos,q_list,label_pos=0.3)

回答1:

In networkx, it's worth checking out the graph drawing algorithms provided by graphviz via nx.graphviz_layout.

I've had good success with neato but the other possible inputs are

  • dot - "hierarchical" or layered drawings of directed graphs. This is the default tool to use if edges have directionality.

  • neato - "spring model'' layouts. This is the default tool to use if the graph is not too large (about 100 nodes) and you don't know anything else about it. Neato attempts to minimize a global energy function, which is equivalent to statistical multi-dimensional scaling.

  • fdp - "spring model'' layouts similar to those of neato, but does this by reducing forces rather than working with energy.

  • sfdp - multiscale version of fdp for the layout of large graphs.

  • twopi - radial layouts, after Graham Wills 97. Nodes are placed on concentric circles depending their distance from a given root node.

  • circo - circular layout, after Six and Tollis 99, Kauffman and Wiese 02. This is suitable for certain diagrams of multiple cyclic structures, such as certain telecommunications networks.

In general, graph drawing is a hard problem. If these algorithms are not sufficient, you'll have to write your own or have networkx draw parts individually.



回答2:

You have a lot of data in your graph, so it is going to be hard to remove clutter.

I suggest you to use any standard layout. You said that you used spring_layout. I suggest you to try it again but this time using the weight attribute when adding the edges.

For example:

import networkx as nx

G = nx.Graph();
G.add_node('A')
G.add_node('B')
G.add_node('C')
G.add_node('D')
G.add_edge('A','B',weight=1)
G.add_edge('C','B',weight=1)
G.add_edge('B','D',weight=30)

pos = nx.spring_layout(G,scale=2)

nx.draw(G,pos,font_size=8)
plt.show()

Additionally you can use the parameter scale to increase the global distance between the nodes.



回答3:

I found this to be useful for quickly visualizing interaction data sourced as a CSV file from PostgreSQL. [Output below reformatted for readability.]

## PSQL ['DUMMY' DATA]:

[interactions_practice]# \copy (SELECT gene_1, gene_2 FROM interactions
  WHERE gene_1 in (SELECT gene_2 FROM interactions))
  TO '/tmp/a.csv' WITH CSV      -- << note: no terminating ";" for this query

## BASH:

[victoria@victoria ~]$ cat /tmp/a.csv                                                                                                      

  APC,TP73
  BARD1,BRCA1
  BARD1,ESR1
  BARD1,KRAS2
  BARD1,SLC22A18
  BARD1,TP53
  BRCA1,BRCA2
  BRCA1,CHEK2
  BRCA1,MLH1
  BRCA1,PHB
  BRCA2,CHEK2
  BRCA2,TP53
  CASP8,ESR1
  CASP8,KRAS2
  CASP8,PIK3CA
  CASP8,SLC22A18
  CDK2,CDKN1A
  CHEK2,CDK2
  ESR1,BRCA1
  ESR1,KRAS2
  ESR1,PPM1D
  ESR1,SLC22A18
  KRAS2,BRCA1
  MLH1,CHEK2
  MLH1,PMS2
  PIK3CA,BRCA1
  PIK3CA,ESR1
  PIK3CA,RB1CC1
  PIK3CA,SLC22A18
  PMS2,TP53
  PTEN,BRCA1
  PTEN,MLH3
  RAD51,BRCA1
  RB1CC1,SLC22A18
  SLC22A18,BRCA1
  TP53,PTEN


## PYTHON 3.5 VENV (ANACONDA):

>>> import networkx as nx
>>> import pylab as plt
>>> G = nx.read_edgelist("/tmp/a.csv", delimiter=",")

>>> G.edges()

  [('CDKN1A', 'CDK2'), ('MLH3', 'PTEN'), ('TP73', 'APC'), ('CHEK2', 'MLH1'),
   ('CHEK2', 'BRCA2'), ('CHEK2', 'CDK2'), ('CHEK2', 'BRCA1'), ('BRCA2', 'TP53'),
   ('BRCA2', 'BRCA1'), ('KRAS2', 'CASP8'), ('KRAS2', 'ESR1'), ('KRAS2', 'BRCA1'),
   ('KRAS2', 'BARD1'), ('PPM1D', 'ESR1'), ('BRCA1', 'PHB'), ('BRCA1', 'ESR1'),
   ('BRCA1', 'PIK3CA'), ('BRCA1', 'PTEN'), ('BRCA1', 'MLH1'), ('BRCA1', 'SLC22A18'),
   ('BRCA1', 'BARD1'), ('BRCA1', 'RAD51'), ('CASP8', 'ESR1'), ('CASP8', 'SLC22A18'),
   ('CASP8', 'PIK3CA'), ('TP53', 'PMS2'), ('TP53', 'PTEN'), ('TP53', 'BARD1'),
   ('PMS2', 'MLH1'), ('PIK3CA', 'SLC22A18'), ('PIK3CA', 'ESR1'), ('PIK3CA', 'RB1CC1'),
   ('SLC22A18', 'ESR1'), ('SLC22A18', 'RB1CC1'), ('SLC22A18', 'BARD1'), ('BARD1', 'ESR1')]

>>> G.number_of_edges()
  36

>>> G.nodes()

  ['CDKN1A', 'MLH3', 'TP73', 'CHEK2', 'BRCA2', 'KRAS2', 'CDK2', 'PPM1D', 'BRCA1',
   'CASP8', 'TP53', 'PMS2', 'RAD51', 'PIK3CA', 'MLH1', 'SLC22A18', 'BARD1', 'PHB', 'APC', 'ESR1', 'RB1CC1', 'PTEN']

>>> G.number_of_nodes()
  22

>>> from networkx.drawing.nx_agraph import graphviz_layout

>>> ## nx.draw(G, pos=graphviz_layout(G))

## DUE TO AN UNIDENTIFIED BUG, I GET THIS ERROR THE FIRST TIME RUNNING THIS
## COMMAND; JUST RE-RUN IT:

>>> nx.draw(G, pos=graphviz_layout(G), node_size=1200, node_color='lightblue',
    linewidths=0.25, font_size=10, font_weight='bold', with_labels=True)

  QGtkStyle could not resolve GTK. Make sure you have installed the proper libraries.

>>> nx.draw(G, pos=graphviz_layout(G), node_size=1200, node_color='lightblue',
    linewidths=0.25, font_size=10, font_weight='bold', with_labels=True)

>>> plt.show()    ## plot1.png [opens in matplotlib popup window] attached

It is difficult to decrease congestion in these static networkx / matplotlib plots; one workaround is to increase the figure size, per this StackOverflow Q/A: High Resolution Image of a Graph using NetworkX and Matplotlib :

>>> plt.figure(figsize=(20,14))
  <matplotlib.figure.Figure object at 0x7f1b65ea5e80>

>>> nx.draw(G, pos=graphviz_layout(G), node_size=1200, node_color='lightblue',
    linewidths=0.25, font_size=10, font_weight='bold', with_labels=True, dpi=1000)

>>> plt.show()    ## plot2.png attached

## RESET OUTPUT FIGURE SIZE TO SYSTEM DEFAULT:

>>> plt.figure()
  <matplotlib.figure.Figure object at 0x7f1b454f1588>

plot1.png

plot2.png

Bonus -- shortest path:

>>> nx.dijkstra_path(G, 'CDKN1A', 'MLH3')
['CDKN1A', 'CDK2', 'CHEK2', 'BRCA1', 'PTEN', 'MLH3']


回答4:

To answer your question how to regulate the distance between nodes, I expand on Hooked's answer:

If you draw the graph via the Graphviz backend and when you then use the fdp algorithm, you can adjust the distance between nodes by the edge attribute len.

Here a code example, how to draw a graph G and save in the Graphviz file gvfile with wider distance between nodes (default distance for fdp is 0.3):

A = nx.to_agraph(G)
A.edge_attr.update(len=3)
A.write(gv_file_name)

Two comments:

  1. It is normally advisable to adjust len with the number of nodes in the graph.
  2. The len attribute is only recognised by the fdp and neato algorithm, but not e.g. by the sfdp algorithm.