This started as a way to find C++/CLI and Managed C++ assemblies so that all classes internal to them could be tested to ensure all inherited methods were being reimplemented. I would like to add this as a build process step to ensure that it never happens again.
Thinking about this problem also made me a bit curious as it would be interesting to be able to determine any .NET language used. Because of this, I went a bit further and compared assemblies from all of the .NET languages. So far here is what I've found through a small program I wrote which compares the type and attribute data from any set of .NET assemblies via reflection:
- C# - Has AssemblyConfigurationAttribute, Has GuidAttribute
- VB - Has many extra "My" type (ex. MyApplication, MySettings), Has GuidAttibute
- F# - Has a FSharpInterfaceDataVersionAttribute which also specifies the version of the compiler used.
- C++ (all but /clr:safe) - Has a bunch of extra types (FrameInfo, type_info)
- C++ /clr:safe - Seems to have no unique reflection features.
It might be reasonable to parse in this order:
- It's F# if it has the FSharpInterfaceDataVersionAttribute
- It's C++ if it has any in the huge set of extra types I found.
- It's VB if it has the "My*" Types.
- It's C# if it has AssemblyConfigurationAttribute or GuidAttribute
- It's likely to be C++ /clr:Safe
However, as this is a horrible hack, I wanted to check in here to make sure that there wasn't another option available.
When a .NET language is compiled, all you get is IL. I am not aware of a standard way of determining which specific language created the assembly. You can take an existing assembly and ildasm (disassemble) it into IL and them ilasm (assemble) it back into a virtually identical assembly.
The heuristics you use is a reasonable and clever way to identify the language used to create the assembly. However, bear in mind that these details might change between compiler versions of the languages.
Checking the references for things like the VB or F# class libraries seems to be the least shaky way to do this, but as others mention, it's a heuristic - just like there's no definitive way to tell which language a native binary is written in (but you can be almost 100% sure by heuristics)