The x86-64 instruction set adds more registers and other improvements to help streamline executable code. However, in many applications the increased pointer size is a burden. The extra, unused bytes in every pointer clog up the cache and might even overflow RAM. GCC, for example, builds with the -m32
flag, and I assume this is the reason.
It's possible to load a 32-bit value and treat it as a pointer. This doesn't necessitate extra instructions, just load/compute the 32 bits and load from the resulting address. The trick won't be portable, though, as platforms have different memory maps. On Mac OS X, the entire low 4 GiB of address space is reserved. Still, for one program I wrote, hackishly adding 0x100000000L
to 32-bit "addresses" before use improved performance greatly over true 64-bit addresses, or compiling with -m32
.
Is there any fundamental impediment to having a 32-bit, x86-64 platform? I suppose that supporting such a chimera would add complexity to any operating system, and anyone wanting that last 20% should just Make it Work™, but it still seems that this would be the best fit for a variety of computationally intensive programs.
There is an ABI called "x32" for linux in development. It's a mix between x86_64 and ia32 similar to what you describe - 32 bit address space while using the full 64 bit register set. It needs a custom kernel, binutils and gcc.
Some SPEC runs indicate a performace improvement of about 30% in some benchmarks. See further information at https://sites.google.com/site/x32abi/
I do not expect it very hard to support such a model in the OS. About the only thing that needs to change for processes in this model is page management, pages must be allocated below the 4 GB point. The kernel too should allocate its buffers from the first 4 GBs of the virtual address space if it passes them to the application. The same applies to the loader that loads and starts applications. Other than that a 64-bit kernel should be able handle such apps w/o major modifications.
Compiler support shouldn't be a big issue either. It's mostly a matter of generating code that can use the extra CPU registers and their full 64 bits and adding proper REX prefixes whenever needed.
It's called "x86-32 emulation", or WOW64 on Windows (presumably something else on other OSes) and it's a hardware flag in the processor. No need for any user-mode tricks here.