I found mtrace by Dr.Clements. Although it is useful, it doesn't work normally in the situation I need. I intend to use the record to understand memory access pattern in different scenario.
Can someone share the related experience? Any suggestion will be appreciated.
0313 Updated: I'm trying to use qemu-mtrace to boot ubuntu 16.04 with linux-mtrace(3.8.0), but it only show several error message and terminated. Hope some tool be able to log every access.
$ ./qemu-system-x86_64 -mtrace-enable -mtrace-file mtrace.out -hda ubuntu.img -m 1024
Error: mtrace_entry_ascope (exit, syscall:xx) with no stack tag!
mtrace_entry_register: mtrace_host_addr failed (10)
mtrace_inst_exec: bad call 140734947607728
Aborted (core dumped)
There is
perf mem
tool implemented for some modern x86/EM64T CPUs (probably, Intel-only; Ivy and newer desktop/server cpus). Man page ofperf mem
is http://man7.org/linux/man-pages/man1/perf-mem.1.html and same text in kernel docs dir: http://lxr.free-electrons.com/source/tools/perf/Documentation/perf-mem.txt. The text is incomplete; the best docs are sources: tools/perf/builtin-mem.c & partially in tools/perf/builtin-report.c. No details in https://perf.wiki.kernel.org/index.php/Tutorial.Unlike
qemu-mtrace
it will not log every memory access, but only every Nth access where N is like 10000 or 100000. But it works with native speed and low overhead. Useperf mem record ./program
to record pattern; try to add-a
or-C cpulist
for system-wide or global sampling for some CPU cores. There is no way to log (trace) all and every memory access from inside the system (tool should write info to memory and will log this access - this is infinite recursion with finite memory), but there are very costly proprietary system-specific external tracing solutions like JTAG or SDRAM sniffer ($5k or more).The tools of
perf mem
where added around 2013 (3.10 version of linux kernel), there are several results of searching perf mem on lwn: https://lwn.net/Articles/531766/Physical address sampling support added: https://lwn.net/Articles/555890/ (
perf mem --phys-addr -t load rec
); (there is also bit related 2016 yearc2c
perf tool "to track down cacheline contention": https://lwn.net/Articles/704125/ with examples https://joemario.github.io/blog/2016/09/01/c2c-blog/)Some random slides on
perf mem
:Some info on decoding
perf mem -D report
: perf mem -D report(answered by the same user as in this answer)
There is also sorting to get some basic stats:
perf mem rep --sort=mem
- http://thread.gmane.org/gmane.linux.kernel.perf.user/1438Other tools.. There is (slow) cachegrind emulator based on valgrind for simulating cache memory for userspace prograns - "7.2 Simulating CPU Caches" of https://lwn.net/Articles/257209/. There should also be something for low-level (slowest) models related to DRAMsim/DRAMsim2 http://eng.umd.edu/~blj/dramsim/