When I read about the performance of JITted languages like C# or Java, authors usually say that they should/could theoretically outperform many native-compiled applications. The theory being that native applications are usually just compiled for a processor family (like x86), so the compiler cannot make certain optimizations as they may not truly be optimizations on all processors. On the other hand, the CLR can make processor-specific optimizations during the JIT process.
Does anyone know if Microsoft's (or Mono's) CLR actually performs processor-specific optimizations during the JIT process? If so, what kind of optimizations?
From back in 2005, David Notario listed several specific targeted optimizations is his blog entry "Does the JIT take advantage of my CPU?". I can't find anything about the new CLR 4, but I imagine several new items are included.
One processor specific optimization I'm aware of that's done in Mono is compiling Mono.Simd
calls down to SSE instructions on processors that support SSE. If the processor running the code doesn't support SSE, the JIT compiler will output the equivalent non-SSE code.
The 32 and 64 bit Jitters are different, that's a start.
.Net Framework Runtime Optimization Service optimizes not just programming issues (compiler's optimization) but also for processors.
I'll point out that the main reason that I hear cited for the potential of JIT-compiled languages to outperform statically compiled languages has nothing to do with processor specific instructions. Instead, it's that information about the dynamic state of the program can be used to optimize code paths. For instance, inline caching can be used to make virtual method calls roughly as fast as non-virtual method calls. Roughly, this works by assuming that at a particular call site the method is called only on a single type and emitting code that jumps directly to that implementation (and then rewriting the code if this assumption is not born out later).
I think some Java compilers do, Microsoft .NET doesn't, and it only beats precompiled when you compare apples to oranges. Precompiled can ship with a library variants tuned to different CPUs (or more likely, different instruction sets) and the runtime check to pick which library to load is a lot cheaper than JIT. For example, mplayer does this (google for mplayer enable-runtime-cpudetection).
I know the rules for whether or not to inline functions changes depending on the processor type (x86, x64). And of course the pointer sizes will vary depending on if it runs as 32-bit or 64-bit.