Does anyone know of a way to compare two .NET assemblies to determine whether they were built from the "same" source files?
I am aware that there are some differencing utilities available, such as the plugin for Reflector, but I am not interested in viewing differences in a GUI, I just want an automated way to compare a collection of binaries to see whether they were built from the same (or equivalent) source files. I understand that multiple different source files could produce the same IL, and realise that the process would only be sensitive to differences in the IL, not the original source.
The main obstacle to just comparing the byte streams for the two assemblies is that .NET includes a field called "MVID" (Module Version Identifier) the assembly. This appears to have a different value for every compilation, so if you build the same code twice the assembly will be different.
A related question is, does anyone know how to force the MVID to be the same for each compilation? This would avoid us needing to have a comparison process that is insensitive to differences in the value of the MVID. A consistent MVID would be preferable, as this means that standard checksums could be used.
The background behind this is that a third-party company is responsible for independently reviewing and signing off our releases, prior to us being permitted to release to Production. This includes reviewing the source code. They want to independently confirm that the source code we give them matches the binaries that we earlier built, tested and currently plan to deploy. We are looking for a process that allows them to independently build the system from the source we supply them with, and the compare the checksums against the checksums for the binaries we have tested.
BTW. Please note that we are using continuous integration, automated builds, source control etc. The issue is not related to an internal lack of control over what source files went into a given build. The issue is that a third party is responsible for verifying that the source we give them produces the same binaries that we have tested and plan to put into Production. They should not be trusting any of our internal systems or controls, including the build server or the source code control system. All they care about is getting the source associated with the build, performing the build themselves, and verifying that the outputs match what we say we are deploying.
The runtime speed of the comparison solution is not particularly important.
thanks
There are a few ways to do this depending on the amount of work you're willing to do and the importance of performance and/or accuracy. One way as Eric J. pointed is to compare the assemblies in binary, excluding the parts that change on every compilation. This solution is easy and fast but could give you a lot of false negatives. One better way is to drill down by using reflection. If performance is critical you can start by comparing the types and if they match go to member definitions. After checking type and member definitions and if everything is equal to that point you can go further by examining the actual IL of each method by getting it through
GetILAsByteArray
method. Again you're going to find differences even if everything is the same but compiled with a little bit different flags or different version of the compiler. I'd say that the best solution is to use a continuous integration tools that tags the build with the changeset number of your source control (you are using one, right?).A related article
I have used the solution of Jerry Currry on .Net 4 assemblies and found out that there is now a third item that will vary on each build: Checksum. Isn't it surprising to find a checksum inside an assembly? I think that adding the checksum of a file inside that file will change the checksum...
Anyway, the modified command is:
Note that I have also changed the search strings a bit by adding the slashes, in order to avoid unintentional matches. The lines of this command should be run together on the same line, split for readability. File names will need double quotes around them if they contain spaces.
It's not too painful to use command-line tools to filter out MVID and date-time stamps from a text representation of the IL. Suppose file1.exe and file2.exe are built from the same sources:
c:\temp> ildasm /all /text file1.exe | find /v "Time-date stamp:" | find /v "MVID" > file1.txt
c:\temp> ildasm /all /text file2.exe | find /v "Time-date stamp:" | find /v "MVID" > file2.txt
c:\temp> fc file1.txt file2.txt
Comparing files file1.txt and FILE2.TXT
FC: no differences encountered
you can use MonoCecil and give it a small modification to get the problem solved. I did it, you can read how over here: http://groups.google.com/group/mono-cecil/browse_thread/thread/6ab42df05daa3a/49e8b3b279850f13#49e8b3b279850f13
Regards Florian
Another solution to consider:
Source code information is stored when binaries are compiled in debug mode. Then you can check if pdb matches exe and if pdb lines matches source code.
When comparing class libraries with ILDasm v4.0.319.1, it seems that image base is not initialized. To avoid mismatches, use a revised solution:
Entry point (image base) is actually interesting information for executable assemblies, and will have to be verified carefully. Injecting a new image base is a common way to make a program do something entirely else. In my case, I am trying to verify consistency of multi-threaded builds, so it's safe to skip over the entry point.
A note on performance: I took an 8MB DLL that was built for AnyCPU, and ran ILDasm. Resulting file was 251MB in size and took several minutes to make. Roughly 32x the size was produced.