-->

How to check if binaries are built from particular

2020-04-03 16:12发布

问题:

The legacy project I am working on includes some external library in a form of set of binary jar files. We decided that for analysis and potential patching, we want to receive sources of this library, use them to build new binaries and after detailed and long enough regression testing switch to these binaries.

Assume that we have already retrieved and built the sources (I am actually in planning phase). Before real testing, I would like to perform some "compatibility checks" to exclude possibility that the sources represent something dramatically different from what is in the "old" binaries.

Using the javap tool I was able to extract the version of JDK used for compilation (at least I believe it is the version of JDK). It says, the binaries were built using major version 46 and minor 0. According to this article it maps to JDK 1.2.

Assume that the same JDK would be used for sources compilation.

The question is: Is there a reliable and possibly effective method of verification if both of these binaries are built from the same sources? I would like to know if all method signatures and class definitions are identical and if most or maybe all of method implementations are identical/similar.

The library is pretty big, so I think that detailed analysis of decompiled binaries may be not an option.

回答1:

I suggest a multi-stage process:

Apply the previously suggested Jardiff or similar to see if there are any API differences. If possible, pick a tool that has an option for reporting private methods etc. In practice, any substantial implementation change in Java is likely to change some methods and classes, even if the public API is unchanged.

If you have an API match, compile a few randomly selected files with the indicated compiler, decompile the result and the original class files, and compare the results. If they match, apply the same process to larger and larger bodies of code until you either find a mismatch, or have checked everything.

Diffs of decompiled code are more likely to give you clues about the nature of the differences, and are easier to filter for non-significant differences, than the actual class files.

If you get a mismatch, analyze it. It may be due to something you do not care about. If so, try to construct a script that will delete that form of difference and resume the compile-and-compare process. If you get widespread mismatches, experiment with compiler parameters such as optimization. If adjustments to the compiler parameters eliminate the differences, continue with the bulk comparison. The objective in this phase is to find a combination of compiler parameters and decompiled code filters that produce a match on the sample files, and apply them to bulk comparison of the library.

If you cannot get a reasonably close match in the decompiled code, you probably do not have the right source code. Even so, if you have an API match it may be worth building your system and running your tests using the result of the compilation. If your tests run at least as well with the version you built from source, continue work using it.



回答2:

There are a variety of JAR comparison tools out there. One that used to be pretty good is Jardiff. I haven't used it in awhile but I'm sure it's still available. There are also some commercial offerings in the same space that could fit your needs.



回答3:

Jardiff that Perception mentioned is a good start, however there is no way to do it 100% percent sure theoretically. This is because the same source can be compiled with different compilers and different compiler configurations and optimization levels. So there is no way to compare binary code (bytecode) beyond class and method signatures.

What do you mean by "similar implementation" of a method? Let's suppose that a clever compiler drops an else case because it figures out that the condition may not be true ever. Are the two similar? Yes and no.. :-)

The best way to go IMHO is setting up very good regression test cases that check every key feature of your libraries. This might be a horror, but on long term might be cheaper than hunting for bugs. It all depends on your future plans in this project. Not a trivial easy decision.



回答4:

For method signatures, use a tool like jardiff.

For similarity of implementation, you have to fall back to a wild guess. Comparing the bytecode on opcode-level may be compiler-dependent and lead to a large number of false negatives. If this is the case, you could fall back to compare the methods of a class using the LineNumberTable.

It gives you a list of line numbers for each method (as long as the class file has been compiled with the debug flag, which is often missing in very old or commercial libraries).

If two class files are compiled from the same source code, then at least the line numbers of each method should match exactly.

You can use a library such as Apache BCEL to retrieve the LineNumberTable:

  // import org.apache.bcel.classfile.ClassParser;
  JavaClass fooClazz = new ClassParser( "Foo.class" ).parse();
  for( Method m : fooClazz.getMethods() )
  {
     LineNumberTable lnt = m.getLineNumberTable();
     LineNumber[] tab = lnt.getLineNumberTable();
     for( LineNumber ln : tab )
     {
        System.out.println( ln.getLineNumber() );
     }
  }