Modules, Assemblies, Headers in CLR

2019-06-16 04:07发布

问题:

I've been reading CLR with C# 3.0 and I've been reflecting on Assemblies, Modules and Headers however things got complicated. This is what I understood but if would be great if someone can clarify things little bit more:

  1. Modules are result of csc.exe which contains IL code and Metadata tables. Metadata tables contains three different tables which are:

    • Definition Tables such as "ModuleDef, TypeDef, PropertyDef, MethodDef, EventDef, FieldDef"
    • Reference Tables such as "TypeRef, ModuleRef, MemberRef,etc."
    • Manifest Tables**
  2. Assemblies are containers which contain many Modules as well as resources such as images, docs, pdf, etc.

  3. PE files that stands for Portable Executable are files can be .EXE or .DLL. These files have PE32 or PE32+ headers, CLR Headers, Metadata, IL Code.

The books says Assembly is a container consists of Modules and it also says Managed Module is

Managed Module:

A managed module is a standard 32-bit Microsoft Windows portable executable (PE32) file or a standard 64-bit Windows portable executable (PE32+) file that requires the CLR to execute.

Richter, Jeffrey (2010-02-05). CLR via C# (Kindle Locations 696-697). OReilly Media - A. Kindle Edition.

Definition of Assembly:

An assembly is a logical grouping of one or more modules or resource files.

Richter, Jeffrey (2010-02-05). CLR via C# (Kindle Locations 766-767). OReilly Media - A. Kindle Edition.

So it seems that Managed Module are actually part of the Assembly in the image taken from the same book.

PE32 headers belong to Assemblies, however author also says it belongs to Managed Modules as well, etc.

What's the separation here? Why did he use Module and Assemblies interchangeable even thought they look separate enough.

A managed PE file has four main parts: the PE32(+) header, the CLR header, the metadata, and the IL. The PE32(+) header is the standard information that Windows expects. The CLR header is a small block of information that is specific to modules that require the CLR (managed modules).

Richter, Jeffrey (2010-02-05). CLR via C# (Kindle Locations 1628-1629). OReilly Media - A. Kindle Edition.

Also the image clearly shows that Modules have only Metadata not PE32(+), CLR headers, etc. Do you think Manifest and Metadata can be used interchangeably?

And also could you please explain **Manifest tables in the Modules as well?

回答1:

What you posted is a bit of shy of exactly how a managed assembly is embedded in a PE32 file. It is a very flexible format, originally intended to store native executable code and resources but flexible enough to also store data. Which is really what an assembly is from the point of view of Windows. Only the CLR can turn that data into something executable.

A PE32 file contains more than just the assembly. There is actually native code in it as well. 5 bytes of it for pure managed assemblies. It has a jump instruction into mscoree.dll, the bootstrapper for managed code. An EXE contains a jump to _CorExeMain, a DLL contains a jump to _CorDllMain. This is further extended for mixed mode assemblies, System.Data.dll and PresentationCore.dll are examples of those. They have large chunks of native code in them, code that's wrapped by managed classes. The C++/CLI compiler and linker is the way to create assemblies like that. The .text section contains the code, the .reloc section contains relocation information that helps a DLL get loaded at an arbitrary address in memory.

Most PE32 files also contain unmanaged resources. A format that Windows understands. That's stored in the .rsrc section. The C# compiler for example automatically creates resources there, something you can override with the /win32res option. You can see this with File + Open + File and select an assembly. There are three important ones:

  • RT_MANIFEST, contains a manifest with resource ID 1. That's what Windows needs to understand that a C# program is compatible with UAC. You create your own manifest by adding a Application Manifest File to a project.
  • ICON, contains one icon that's picked as the default icon for a desktop shortcut
  • Version, contains an unmanaged version resource. Visible in the Explorer Details property sheet, synthesized by the compiler from the assembly attributes in AssemblyInfo.cs

Dumpbin.exe is a tool to peek at the internals of a PE32 file. Unfortunately it knows about managed assemblies so you can't see everything.



回答2:

Richter's book is great, but the "truth" is defined in the ECMA CLI standard.
Please check chapter 5 "Terms and definitions" for the definition according to the official standard.
I think you will understand the commonalities and differences between the different terms best when just looking at the definitions there.