As a result of trying to answer the question Graph isomorphism for jar files, the debate naturally arose as to how to represent a jar file as a graph using Python.
The problem: given a jar file, read the files contained within it and create a representation of the contents as (a) a data structure and (b) as a graphic, both of them suitable for further study and manipulation, such as, for example, assessing isomorphism with another jar file. In the graph, the tree of directories should be root and branch nodes, ending in files as leaf nodes.
To standardise the answer I use the verletphysics.jar
file downloaded from this OpenProcessing sketch.
The Solution
Given that jar files are basically zipped archives, use the zipfile
module from the standard library in Python to read the contents and prepare textual and graphic representation of the relations of the contents of the jar.
Textual Representation
For the file verletphysics.jar
as mentioned in the question, the code below produces this list of contents:
META-INF/
META-INF/MANIFEST.MF
toxi/
toxi/physics/
toxi/physics/behaviors/
toxi/physics/constraints/
toxi/physics2d/
toxi/physics2d/behaviors/
toxi/physics2d/constraints/
toxi/physics/ParticlePath.class
toxi/physics/ParticleString.class
toxi/physics/PullBackString.class
toxi/physics/VerletConstrainedSpring.class
toxi/physics/VerletMinDistanceSpring.class
toxi/physics/VerletParticle.class
toxi/physics/VerletPhysics.class
toxi/physics/VerletSpring.class
toxi/physics/behaviors/AttractionBehavior.class
toxi/physics/behaviors/ConstantForceBehavior.class
toxi/physics/behaviors/GravityBehavior.class
toxi/physics/behaviors/ParticleBehavior.class
toxi/physics/constraints/AxisConstraint.class
toxi/physics/constraints/BoxConstraint.class
toxi/physics/constraints/CylinderConstraint.class
toxi/physics/constraints/MaxConstraint.class
toxi/physics/constraints/MinConstraint.class
toxi/physics/constraints/ParticleConstraint.class
toxi/physics/constraints/PlaneConstraint.class
toxi/physics/constraints/SoftBoxConstraint.class
toxi/physics/constraints/SphereConstraint.class
toxi/physics2d/ParticlePath2D.class
toxi/physics2d/ParticleString2D.class
toxi/physics2d/PullBackString2D.class
toxi/physics2d/VerletConstrainedSpring2D.class
toxi/physics2d/VerletMinDistanceSpring2D.class
toxi/physics2d/VerletParticle2D.class
toxi/physics2d/VerletPhysics2D.class
toxi/physics2d/VerletSpring2D.class
toxi/physics2d/behaviors/AttractionBehavior.class
toxi/physics2d/behaviors/ConstantForceBehavior.class
toxi/physics2d/behaviors/GravityBehavior.class
toxi/physics2d/behaviors/ParticleBehavior2D.class
toxi/physics2d/constraints/AngularConstraint.class
toxi/physics2d/constraints/AxisConstraint.class
toxi/physics2d/constraints/CircularConstraint.class
toxi/physics2d/constraints/MaxConstraint.class
toxi/physics2d/constraints/MinConstraint.class
toxi/physics2d/constraints/ParticleConstraint2D.class
toxi/physics2d/constraints/RectConstraint.class
verletphysics.mf
The Key
Each node in the above pathnames is extracted and given a unique id by the code, as below:
Index File
0 behaviors
1 BoxConstraint.class
2 MaxConstraint.class
3 VerletParticle.class
4 ParticleConstraint2D.class
5 ConstantForceBehavior.class
6 META-INF
7 VerletMinDistanceSpring2D.class
8 AxisConstraint.class
9 AttractionBehavior.class
10 physics2d
11 VerletPhysics.class
12 PullBackString.class
13 VerletSpring.class
14 VerletConstrainedSpring.class
15 ParticleString2D.class
16 verletphysics.mf
17 ParticleBehavior2D.class
18 ParticleString.class
19 RectConstraint.class
20 CylinderConstraint.class
21 toxi
22 VerletMinDistanceSpring.class
23 VerletSpring2D.class
24 VerletParticle2D.class
25 ParticlePath2D.class
26 CircularConstraint.class
27 ParticlePath.class
28 MinConstraint.class
29 MANIFEST.MF
30 ParticleConstraint.class
31 GravityBehavior.class
32 VerletPhysics2D.class
33 SoftBoxConstraint.class
34 ParticleBehavior.class
35 VerletConstrainedSpring2D.class
36 PlaneConstraint.class
37 PullBackString2D.class
38 SphereConstraint.class
39 physics
40 AngularConstraint.class
41 constraints
The Graph
The pathnames are translated into edges that are built up into this network using NetworkX and plotted with matplotlib.
The Code
import zipfile
import networkx as nx
import matplotlib.pyplot as plt
# Download the code from
# http://www.openprocessing.org/sketch/46757
# Unzip and find the jar file: verletphysics.jar
# This example uses that file for demo
def get_edges(fName):
edges = []
nodes = []
jar = zipfile.ZipFile(fName, "r")
for name in jar.namelist():
print name # prints the list of files in the jar
if name.endswith('/'): name = name[:-1]
parts = name.split('/')
nodes.extend( parts )
if len(parts) > 1:
edges += zip(nodes[:-1], nodes[1:])
nodes = set(nodes)
nodes = dict( zip(nodes, range(len(nodes)) ) )
edges = [ (nodes[ edge[0] ], nodes[ edge[1] ])
for edge in edges ]
nodes = [ (index, label) for label, index in nodes.iteritems() ]
nodes = sorted( nodes, key = lambda node: node[0] )
return set( edges ), nodes
if __name__ == '__main__':
fName = 'verletphysics.jar'
edges, nodes = get_edges(fName)
# print list of nodes
# serving as a key to the graph
print '%10s %s' % ('Index', 'File')
for node in nodes:
print '%10s %s' % (node[0], node[1])
# Plot the network graph
G = nx.Graph()
G.add_edges_from( edges )
nx.draw_networkx(G, pos=nx.spring_layout(G))
plt.axis('off')
plt.show()