par2 has a small and fairly clean C++ codebase, which I think builds fine on GNU/Linux, OS X, and Windows (with MSVC++).
I'd like to incorporate an x86-64 asm version of the one function that takes nearly all the CPU time. (mailing list posts with more details. My implementation/benchmark here.)
Intrinsics would be the obvious solution, but gcc doesn't generate good enough code for getting one byte at a time from a 64bit register for use as an index into a LUT. I might also take the time to schedule instructions so each uop cache line holds a multiple of 4 uops, since uop throughput is the bottleneck even when the input/output buffer is a decent size.
I'd prefer not to introduce a build-dependency on yasm, since many people have gcc installed, but not yasm.
Is there a way to write a function in asm in a separate file that gcc / clang and MSVC can assemble? The goals are:
- no extra software as a build-dep. (no YASM).
- only one version of each asm function. (no maintaining MASM & AT&T versions of the same code.)
Par2cmdline's build systems is autoconf/automake for Unix, MSVC .sln
for Windows.
I know GNU assemble has a .intel_syntax noprefix
directive, but that only changes instruction formats, not other assembler directives. e.g. .align 16
vs. align 16
. My code is fairly simple and small, so it would be ok to work around the different directives with C-preprocessor #define
s, if that can work.
I'm assuming that doing CPU-detection and setting a function pointer based on the result shouldn't be a problem in C++, even if I have to use some #ifdef
conditional compilation for that.
If there isn't a solution to what I'm hoping for, I'll probably introduce a build-depend on yasm and have a ./configure --no-asm
option to disable asm speedups for people building on x86 without yasm available.
My preferred plan for handling the different calling convention in the Windows and Linux ABIs was to use __attribute__((sysv_abi))
on my C prototypes for my asm functions. Then I only have to write the function prologue for the SysV ABI. Does MSVC has anything like that, that would put args into regs according to the SysV ABI for certain functions? (BTW, this tickled a compiler bug, so be careful with this idea if you want your code to work with current gcc.)
While I have no good solution to remove the dependency on a particular assembler I do have a suggestion on how to deal the two difference 64-bit calling conventions: Microsoft x64 versus SysV ABI.
The lowest commen denominator is the Microsoft x64 calling conventions since it can only pass the first four values by register. So if you limit yourself to this and use macros to define the registers you can easily make your code compile for both Unix (Linux/BSD/OSX) and Windows.
For example look in the file
strcat64.asm
in Agner Fog's asmlibI don't think four registers is really a limitation because if you're writing something in assembly it's because you want the best efficiency in which case the function calling overhead should be negligible compare to the function itself so pushing/popping some values to/from the stack if you need to when calling the function should not make a difference in performance.