How to represent a jar file as a network graph?

2019-05-14 05:58发布

问题:

As a result of trying to answer the question Graph isomorphism for jar files, the debate naturally arose as to how to represent a jar file as a graph using Python.

The problem: given a jar file, read the files contained within it and create a representation of the contents as (a) a data structure and (b) as a graphic, both of them suitable for further study and manipulation, such as, for example, assessing isomorphism with another jar file. In the graph, the tree of directories should be root and branch nodes, ending in files as leaf nodes.

To standardise the answer I use the verletphysics.jar file downloaded from this OpenProcessing sketch.

回答1:

The Solution

Given that jar files are basically zipped archives, use the zipfile module from the standard library in Python to read the contents and prepare textual and graphic representation of the relations of the contents of the jar.

Textual Representation

For the file verletphysics.jar as mentioned in the question, the code below produces this list of contents:

META-INF/
META-INF/MANIFEST.MF
toxi/
toxi/physics/
toxi/physics/behaviors/
toxi/physics/constraints/
toxi/physics2d/
toxi/physics2d/behaviors/
toxi/physics2d/constraints/
toxi/physics/ParticlePath.class
toxi/physics/ParticleString.class
toxi/physics/PullBackString.class
toxi/physics/VerletConstrainedSpring.class
toxi/physics/VerletMinDistanceSpring.class
toxi/physics/VerletParticle.class
toxi/physics/VerletPhysics.class
toxi/physics/VerletSpring.class
toxi/physics/behaviors/AttractionBehavior.class
toxi/physics/behaviors/ConstantForceBehavior.class
toxi/physics/behaviors/GravityBehavior.class
toxi/physics/behaviors/ParticleBehavior.class
toxi/physics/constraints/AxisConstraint.class
toxi/physics/constraints/BoxConstraint.class
toxi/physics/constraints/CylinderConstraint.class
toxi/physics/constraints/MaxConstraint.class
toxi/physics/constraints/MinConstraint.class
toxi/physics/constraints/ParticleConstraint.class
toxi/physics/constraints/PlaneConstraint.class
toxi/physics/constraints/SoftBoxConstraint.class
toxi/physics/constraints/SphereConstraint.class
toxi/physics2d/ParticlePath2D.class
toxi/physics2d/ParticleString2D.class
toxi/physics2d/PullBackString2D.class
toxi/physics2d/VerletConstrainedSpring2D.class
toxi/physics2d/VerletMinDistanceSpring2D.class
toxi/physics2d/VerletParticle2D.class
toxi/physics2d/VerletPhysics2D.class
toxi/physics2d/VerletSpring2D.class
toxi/physics2d/behaviors/AttractionBehavior.class
toxi/physics2d/behaviors/ConstantForceBehavior.class
toxi/physics2d/behaviors/GravityBehavior.class
toxi/physics2d/behaviors/ParticleBehavior2D.class
toxi/physics2d/constraints/AngularConstraint.class
toxi/physics2d/constraints/AxisConstraint.class
toxi/physics2d/constraints/CircularConstraint.class
toxi/physics2d/constraints/MaxConstraint.class
toxi/physics2d/constraints/MinConstraint.class
toxi/physics2d/constraints/ParticleConstraint2D.class
toxi/physics2d/constraints/RectConstraint.class
verletphysics.mf

The Key

Each node in the above pathnames is extracted and given a unique id by the code, as below:

 Index  File
     0  behaviors
     1  BoxConstraint.class
     2  MaxConstraint.class
     3  VerletParticle.class
     4  ParticleConstraint2D.class
     5  ConstantForceBehavior.class
     6  META-INF
     7  VerletMinDistanceSpring2D.class
     8  AxisConstraint.class
     9  AttractionBehavior.class
    10  physics2d
    11  VerletPhysics.class
    12  PullBackString.class
    13  VerletSpring.class
    14  VerletConstrainedSpring.class
    15  ParticleString2D.class
    16  verletphysics.mf
    17  ParticleBehavior2D.class
    18  ParticleString.class
    19  RectConstraint.class
    20  CylinderConstraint.class
    21  toxi
    22  VerletMinDistanceSpring.class
    23  VerletSpring2D.class
    24  VerletParticle2D.class
    25  ParticlePath2D.class
    26  CircularConstraint.class
    27  ParticlePath.class
    28  MinConstraint.class
    29  MANIFEST.MF
    30  ParticleConstraint.class
    31  GravityBehavior.class
    32  VerletPhysics2D.class
    33  SoftBoxConstraint.class
    34  ParticleBehavior.class
    35  VerletConstrainedSpring2D.class
    36  PlaneConstraint.class
    37  PullBackString2D.class
    38  SphereConstraint.class
    39  physics
    40  AngularConstraint.class
    41  constraints

The Graph

The pathnames are translated into edges that are built up into this network using NetworkX and plotted with matplotlib.

The Code

import zipfile
import networkx as nx
import matplotlib.pyplot as plt

# Download the code from
# http://www.openprocessing.org/sketch/46757
# Unzip and find the jar file: verletphysics.jar
# This example uses that file for demo

def get_edges(fName):
    edges = []
    nodes = []

    jar = zipfile.ZipFile(fName, "r")
    for name in jar.namelist():
        print name # prints the list of files in the jar
        if name.endswith('/'): name = name[:-1]
        parts = name.split('/')
        nodes.extend( parts )
        if len(parts) > 1:
            edges += zip(nodes[:-1], nodes[1:]) 

    nodes = set(nodes)
    nodes = dict( zip(nodes, range(len(nodes)) ) )
    edges = [ (nodes[ edge[0] ], nodes[ edge[1] ])
              for edge in edges ]
    nodes = [ (index, label) for label, index in nodes.iteritems() ]
    nodes = sorted( nodes, key = lambda node: node[0] )
    return set( edges ), nodes

if __name__ == '__main__':
    fName = 'verletphysics.jar'
    edges, nodes = get_edges(fName)

    # print list of nodes
    # serving as a key to the graph
    print '%10s  %s' % ('Index', 'File')
    for node in nodes:
        print '%10s  %s' % (node[0], node[1])

    # Plot the network graph 
    G = nx.Graph()
    G.add_edges_from( edges )
    nx.draw_networkx(G, pos=nx.spring_layout(G))
    plt.axis('off')
    plt.show()