par2 has a small and fairly clean C++ codebase, which I think builds fine on GNU/Linux, OS X, and Windows (with MSVC++).
I'd like to incorporate an x86-64 asm version of the one function that takes nearly all the CPU time. (mailing list posts with more details. My implementation/benchmark here.)
Intrinsics would be the obvious solution, but gcc doesn't generate good enough code for getting one byte at a time from a 64bit register for use as an index into a LUT. I might also take the time to schedule instructions so each uop cache line holds a multiple of 4 uops, since uop throughput is the bottleneck even when the input/output buffer is a decent size.
I'd prefer not to introduce a build-dependency on yasm, since many people have gcc installed, but not yasm.
Is there a way to write a function in asm in a separate file that gcc / clang and MSVC can assemble? The goals are:
- no extra software as a build-dep. (no YASM).
- only one version of each asm function. (no maintaining MASM & AT&T versions of the same code.)
Par2cmdline's build systems is autoconf/automake for Unix, MSVC .sln
for Windows.
I know GNU assemble has a .intel_syntax noprefix
directive, but that only changes instruction formats, not other assembler directives. e.g. .align 16
vs. align 16
. My code is fairly simple and small, so it would be ok to work around the different directives with C-preprocessor #define
s, if that can work.
I'm assuming that doing CPU-detection and setting a function pointer based on the result shouldn't be a problem in C++, even if I have to use some #ifdef
conditional compilation for that.
If there isn't a solution to what I'm hoping for, I'll probably introduce a build-depend on yasm and have a ./configure --no-asm
option to disable asm speedups for people building on x86 without yasm available.
My preferred plan for handling the different calling convention in the Windows and Linux ABIs was to use __attribute__((sysv_abi))
on my C prototypes for my asm functions. Then I only have to write the function prologue for the SysV ABI. Does MSVC has anything like that, that would put args into regs according to the SysV ABI for certain functions? (BTW, this tickled a compiler bug, so be careful with this idea if you want your code to work with current gcc.)