Modern x86_64 linux with glibc will detect that CPU has support of AVX extension and will switch many string functions from generic implementation to AVX-optimized version (with help of ifunc dispatchers: 1, 2).
This feature can be good for performance, but it prevents several tool like valgrind (older libVEXs, before valgrind-3.8) and gdb's "target record
" (Reverse Execution) from working correctly (Ubuntu "Z" 17.04 beta, gdb 7.12.50.20170207-0ubuntu2, gcc 6.3.0-8ubuntu1 20170221, Ubuntu GLIBC 2.24-7ubuntu2):
$ cat a.c
#include <string.h>
#define N 1000
int main(){
char src[N], dst[N];
memcpy(dst, src, N);
return 0;
}
$ gcc a.c -o a -fno-builtin
$ gdb -q ./a
Reading symbols from ./a...(no debugging symbols found)...done.
(gdb) start
Temporary breakpoint 1 at 0x724
Starting program: /home/user/src/a
Temporary breakpoint 1, 0x0000555555554724 in main ()
(gdb) record
(gdb) c
Continuing.
Process record does not support instruction 0xc5 at address 0x7ffff7b60d31.
Process record: failed to record execution log.
Program stopped.
__memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:416
416 VMOVU (%rsi), %VEC(4)
(gdb) x/i $pc
=> 0x7ffff7b60d31 <__memmove_avx_unaligned_erms+529>: vmovdqu (%rsi),%ymm4
There is error message "Process record does not support instruction 0xc5
" from gdb's implementation of "target record", because AVX instructions are not supported by the record/replay engine (sometimes the problem is detected on _dl_runtime_resolve_avx
function): https://sourceware.org/ml/gdb/2016-08/msg00028.html "some AVX instructions are not supported by process record", https://bugs.launchpad.net/ubuntu/+source/gdb/+bug/1573786, https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=836802, https://bugzilla.redhat.com/show_bug.cgi?id=1136403
Solution proposed in https://sourceware.org/ml/gdb/2016-08/msg00028.html "You can recompile libc (thus ld.so), or hack __init_cpu_features and thus __cpu_features at runtime (see e.g. strcmp)." or set LD_BIND_NOW=1
, but recompiled glibc still has AVX, and ld bind-now doesn't help.
I heard that there are /etc/ld.so.nohwcap
and LD_HWCAP_MASK
configurations in glibc. Can they be used to disable ifunc dispatching to AVX-optimized string functions in glibc?
How does glibc (rtld?) detects AVX, using cpuid
, with /proc/cpuinfo
(probably not), or HWCAP aux (LD_SHOW_AUXV=1 /bin/echo |grep HWCAP
command gives AT_HWCAP: bfebfbff
)?
Not the best or complete solution, just a smallest bit-editing kludge to allow valgrind and gdb record for the my task.
Lekensteyn asks:
I did full rebuild of unmodified glibc, which is rather easy in debian and ubuntu: just
sudo apt-get source glibc
,sudo apt-get build-dep glibc
andcd glibc-*/; dpkg-buildpackage -us -uc
(manual to get the ld.so without stripped debugging information.Then I did binary (bit) patching of the output ld.so file, in the function used by
__get_cpu_features
. Target function was compiled fromget_common_indeces
of source filesysdeps/x86/cpu-features.c
under the name ofget_common_indeces.constprop.1
(it is just next after the__get_cpu_features
in the binary code). It has several cpuids, first one iscpuid eax=1
"Processor Info and Feature Bits"; and later there is check "jle 0x6" and jump down around the code "cpuid eax=7 ecx=0
Extended Features" just to get AVX2 status. There is the code which was compiled into this logic:The
cpu_features->max_cpuid
was filled ininit_cpu_features
of the same file in__cpuid (0, cpu_features->max_cpuid, ebx, ecx, edx);
line. It was easier to disable theif
statement by replacingjle
aftercmp 0x6
withjg
(byte 0x7e to 0x7f). (Actually this binary patch was reapplied manually to the__get_cpu_features
function of real systemld-linux.so.2
- first jle beforemov 7 eax; xor ecx,ecx; cpuid
changed into jg.)Recompiled package and modified ld.so were not installed into the system; I used commandline syntax of
ld.so ./my_program
(ormv ld.so /some/short/path.so
andpatchelf --set-interpreter ./my_program
).Other possible solutions:
if (cpu_features->max_cpuid >= 7)
in glibc and recompileYes: setting
LD_HWCAP_MASK=0
will make GLIBC pretend that none of the CPU capabilities are available. Code.Setting the mask to 0 is likely to trigger an error, you'll likely need to figure out the precise bit that controls AVX, and mask just that bit.
There does not seem a straightforward runtime method to patch feature detection. This detection happens rather early in the dynamic linker (ld.so).
Binary patching the linker seems the easiest method at the moment. @osgx described one method where a jump is overwritten. Another approach is just to fake the cpuid result. Normally
cpuid(eax=0)
returns the highest supported function ineax
while the manufacturer IDs are returned in registers ebx, ecx and edx. We have this snippet in glibc 2.25sysdeps/x86/cpu-features.c
:The
__cpuid
line translates to these instructions in/lib/ld-linux-x86-64.so.2
(/lib/ld-2.25.so
):So rather than patching branches, we could as well change the
cpuid
into anop
instruction which would result in invocation of the lastelse
branch (as the registers will not contain "GenuineIntel"). Since initiallyeax=0
,cpu_features->max_cpuid
will also be 0 and theif (cpu_features->max_cpuid >= 7)
will also be bypassed.Binary patching
cpuid(eax=0)
bynop
this can be done with this utility (works for both x86 and x86-64):That was the easy part. Now, I did not want to replace the system-wide dynamic linker, but execute only one particular program with this linker. Sure, that can be done with
./ld-linux-x86-64-patched.so.2 ./a
, but the naive gdb invocations failed to set breakpoints:A manual workaround is described in How to debug program with custom elf interpreter? It works, but it is unfortunately a manual action using
add-symbol-file
. It should be possible to automate it a bit using GDB Catchpoints though.An alternative approach that does not binary linking is
LD_PRELOAD
ing a library that defines custom routines formemcpy
,memove
, etc. This will then take precedence over the glibc routines. The full list of functions is available insysdeps/x86_64/multiarch/ifunc-impl-list.c
. Current HEAD has more symbols compared to the glibc 2.25 release, in total (grep -Po 'IFUNC_IMPL \(i, name, \K[^,]+' sysdeps/x86_64/multiarch/ifunc-impl-list.c
):