I have compiled my code with specific flags (-Os, -O2, -march=native and their combinations) in order to produce a faster execution time.
But my problem is that I don't run always in the same machine (because in my lab there are several different machines). Sometimes I run within a MacOS, or within a Linux (in both cases with different OS versions).
I wonder if there is a way to determine which binary will be run depending on the environment where the binary will run (I mean cache size, cpu cores, and other properties about the specific machine)?. In other words, how to choose (when the program loads) the faster binary (previously compiled with different target binary sizes and instruction-set extensions) according to the specific machine used?
Thanks in advance.
Is there a reason you can't just recompile your source code on each machine? Compilers are already written and optimized exactly for this kind of stuff. Simply recompile your source code on that machine architecture and you'll have a binary that runs just fine on that machine.
You may prebuilt a bunch of executables and choose one according to environment variable or things like
uname
. A Better approach to the problem is choose a toolchain that is able to perform JIT, install-time optimization and/or runtime optimization, like llvm.What you're talking about is called a fat binary (not FAT, the acronym). From Wikipedia1:
At quick glance, there doesn't seem to be much support for it (see this question from the Programmer StackExchange for more information). Apple implemented this briefly when transitioning from PowerPC to Intel, but it doesn't seem to have been explored much since then.
Technically, fat binaries refer to a single binary that could run on multiple architectures...but I imagine the premise would hold for a single binary that runs on multiple OSes. And it comes back to the point Bizkit made in his/her/zir answer - generally, you compile your source code for the environment that you're in ahead of time.
If you want your code tuned for the cache size of the machine you run on, check out the way Automatically Tuned Linear Algebra Software (ATLAS) does it. When you compile it, it runs some tests to find what size to use for cache-blocking its loops, and puts that in a header file.