I'm looking for a fail-safe way to round-trip between a JVM class file and a text representation and back again.
One strict requirement is that the resulting round-tripped JVM class file is exactly functionally equivalent to the original JVM class file as long as the text representation is left unchanged.
Furthermore, the text representation must be human-readable and editable. It should be possible to make small changes to the the text representation (such as changing a text string or a class name, etc.) which are reflected in the resulting class file representation.
The simplest solution would be to use a Java decompiler such as JAD to generate the text representation, which in this case would simply be the re-created Java source code. And then use javac to generate the byte-code. However, given the state of the free Java decompilers this approach does not work under all circumstances. It is rather easy to create obfuscated byte-code that does not survive a full round-trip class-file/java-source/class-file (in part because there simply isn't a 1:1 mapping between JVM byte-code and Java source code).
Is there a fail-safe way to achieve JVM class-file/text-representation/class-file round-tripping given the requirements above?
Update: Before answering - save time and effort by reading all the requirements above, and note specifically:
- "Text-representation of JVM bytecode" does not necessarily mean "Java source-code".
The BCEL project provides a JasminVisitor which will convert class files into jasmin assembly.
This can be modified and then reassembled into class files. If no edits are made and the versions are kept compatible the the round trip should result in identical class files except that line number mapping may be lost. If you require a bit for bit identical copy for the round trip case you will likely need to alter the tool to take aspects of the code which are pure meta data as well.
jasmin is rather old and is not designed with ease of actually writing full blown programs in assembly but for modifying string constant tables and constants it should be more than adequate.
Looks like ASM does this. (This is the same sort of answer as ShuggyCoUk's, but with a different tool.) Jarjar says it uses ASM for exactly the sort of thing you're talking about.
No. There exists valid byte-code without a corresponding Java program.
The Soot project has a quite sophisticated decompiler- http://www.sable.mcgill.ca/dava/ - which may be useful for those byte codes coming from a Java compiler. It is, however, not perfect.
Your best bet is still getting the source code for the class files.
I've written a tool that's designed for exactly this.
The Krakatau disassembler and assembler is designed to handle any valid classfile, no matter how bizarre. It uses an assembly format based on the Jasmin format, but extended to support all the classfile features that Jasmin can't handle. It even supports some of the obscure or undocumented 'features' of Hotspot, such as pre
45.3
classfiles using smaller widths for the Code attribute fields.It can roundtrip any classfile I know of. The result won't be identical binary wise, but it will have the same functionality (constant pool entries may be rearranged for instance).
Update: Krakatau now supports exact binary roundtripping of classfiles. Passing the
-roundtrip
flag will preserve the order of constant pool entries, etc.Jasmin and Kimera?