The project I'm working on will generate code for a large number of classes - hundreds to thousands is expected.
It is not known at generation time how many of these classes will actually be accessed.
The generated classes could (1) all live in a single assembly (or possibly a handful of assemblies), which would be loaded when toe consuming process starts.
...or (2) I could generate a single assembly for each class, much like Java compiles every class to a single *.class
binary file, and then come up with a mechanism to load the assemblies on demand.
The question: which case would yield better (memory and time) performance?
My gut feeling is that for case (1) the load time and used memory are directly proportional to the number of classes that make up the single monolithic assembly. OTOH, case (2) comes with its complications.
If you know any resources pertaining to the internals of loading assemblies, especially what code gets called (if any!?) and what memory is allocated (book-keeping for the newly loaded assembly), please share them.
You are trying to solve a non-existing problem, assembly loading is very heavily optimized in .NET.
Breaking up a large assembly into many smaller ones is without a doubt the very worst thing you can do. By far the biggest expense in loading an assembly is finding the file. This is a cold start problem, the CLR loader is bogged down by the slow disk, directory entries need to be retrieved and searched to locate the disk sectors that contain the file content. This problem disappears on a warm start when the assembly data can be retrieved from the file system cache. Note that Java doesn't do it it this way either, it packages .class files into a .jar. A .jar is the rough equivalent of an assembly.
Once the file is located, .NET uses an operating system facility to making actually loading the assembly data very cheap. It uses a memory mapped file. Which involves merely reserving the virtual memory for the file but not reading from the file.
Reading doesn't happen until later and is done by a page fault. A feature of any demand-paged virtual memory operating system. Accessing the virtual memory produces a page fault, the operating system loads the data from the file and maps the virtual memory page into RAM. After which the program continues, never being aware that it got interrupted by the OS. It will be the jitter that produces these page faults, it accesses the metadata tables in the assembly to locate the IL for a method. From which it then generates executable machine code.
An automatic benefit from this scheme is that you never pay for code that is in an assembly but is not being used. The jitter simply has no reason to look at the section of the file that contains the IL so it never actually gets read.
And note the disadvantage of this scheme, using a class for the first time does involve a perf hit due to the disk read. This needs to be paid for one way or another, in .NET the debt is due at the last possible moment. Which is why attributes have a reputation for being slow.
Bigger assemblies are always better than many smaller assemblies.
which case would yield better (memory and time) performance
Baring in mind the compiler will do a lot of optimization for you, option 1 is definitely the way to go. It seems complete overkill to have a separate assembly per class. Not only that, you would probably find it quicker loading 1 large assembly than lots of small ones.
Also, this does feel like premature optimization my advice would be stick with the first (sane) option and split the classes up into separate assemblies later if you see a need to.