How to always produce byte-for-byte identical .exe

2020-05-26 04:05发布

问题:

I'll give you a little bit of background first as to why I'm asking this question:

I am currently working in a stricly-regulated industry and as such our code is quite carefully looked-over by official test houses. These test houses expect to be able to build the code and generate an .exe or .dll which is EXACTLY the same each and every time (without changing any code obviously!). They check the MD5 and the SHA1 of the executables that they create to ensure this.

Up until this point I have predominantly been coding in C++, where (after a few project setting tweaks) I managed to get the projects to rebuild consistantly to the same MD5/SHA1. I am now using C# in a project and am having great difficulty getting the MD5's to match after a rebuild. I am aware that there are "Time-Stamps" in the PE header of the file, and they have been cleared to 0. I am also aware that there is a GUID for the .exe, which again has been cleared to 00 00 00... etc. However the files still don't match.

I'm using CFF Explorer to view and edit the PE Header to remove the time and date stamps. After using a binary comparison tool there are only 2 blocks of bytes in the .exe's that are different (both very small).

One of the inconsistant blocks appears just before some binary code, which in ASCII details the path of the *Project*\obj\Release\xxx.pdb file.

EDIT: This is now known to be the GUID of the *.pdb file, however I still don't know if I can modify it without causing any errors!?

The other block appears in the middle of what looks to be function names, ie. (a typical section) AssemblyName.GetName.Version.get_Version.System.IO.Ports.SerialPort.Parity.Byte.<PrivateImplementationDetails>{

then the different code block:

4A134ACE-D6A0-461B-A47C-3A4232D90816

followed by:

"}.ValueType.__StaticArrayInitTypeSize=7.$$method0x60000ab-1.RuntimeFieldHandle.InitializeArray`... etc..

Any ideas or suggestions would be most welcome!

回答1:

Update: Roslyn seems to have a /feature:deterministic compiler flag for reproducible builds, although it's not 100% working yet.


You should be able to get rid of the debug GUID by disabling PDB generation. If not, setting the GUID to zeroes is fine - only debuggers look at that section (you won't be able to debug the assembly anymore, but it should still run fine).

The PrivateImplementationDetails are a bit more difficult - these are internal helper classes generated by the compiler for certain language constructs (array initializers, switch statements using strings, etc.). Because they are only used internally, the class name doesn't really matter, so you could just assign a running number to them.

I would do this by going through the #Strings metadata stream and replacing all strings of the form "<PrivateImplementationDetails>{GUID}" with "<PrivateImplementationDetails>{running number, padded to same length as a GUID}".

The #Strings metadata stream is simply the list of strings used by the metadata, encoded in UTF-8 and separated by \0; so finding and replacing the names should be easy once you know where the #Strings stream is inside the executable file.

Unfortunately the "metadata stream headers" containing this information are quite buried inside the file format. You'll have to start at the NT Optional Header, find the pointer to the CLI Runtime Header, resolve it to a file position using the PE section table (it's an RVA, but you need a position inside the file), then go to the metadata root and read the stream headers.



回答2:

I'm not sure about this, but just a thought: are you using any anonymous types for which the compiler might generate names behind the scenes, which might be different each time the compiler runs? Just a possibility which occurred to me. Probably one for Jon Skeet ;-)

Update: You could perhaps also use Reflector addins for comparison and disassembly.



回答3:

Regarding the PDB GUID problem, if you specify that a PDB shouldn't be generated at compilation for Release builds, does the binary still contain the PDB's file system GUID?

To disable PDB generation:

  1. Right-click your project in Solution Explorer and select Properties.
  2. From the menu along the left, select Build.
  3. Ensure that the Configuration selection is Release (you'll still want a PDB for debugging).
  4. Click the Advanced button in the bottom right.
  5. Under Output / Debug Info, select None.

If you're building from the console, use /debug- to get the same result.



回答4:

Take a look at the answers from this question. Especially on the external link provided in the 3rd one.

EDIT:

I actually wantetd to link to this article.



回答5:

You said that after a few project tweaks you were able to get C++ apps to compile repeatably to the same SHA1/MD5 values. I'm in the same boat as you in being in an industry with a third party test lab that needs to rebuild exactly the same executables repeatably.

In researching how to make this happen in VS2005, I came across your post here. Could you share the project tweaks you did to make the C++ apps build to the same SHA1/MD5 values consistently? It would be of great help to myself and perhaps any others that share this requirement.



回答6:

Use ildasm.exe to fully disassemble both programs and compare the IL. Then you can "clean" the code using text-based methods and (predictably) recompile it again.