For example, to monitor all mkdir
calls made, the best I could come up with was:
#!/bin/sh
set -eux
d=debug/tracing
mkdir -p debug
if ! mountpoint -q debug; then
mount -t debugfs nodev debug
fi
# Stop tracing.
echo 0 > "${d}/tracing_on"
# Clear previous traces.
echo > "${d}/trace"
# Enable tracing mkdir
echo sys_enter_mkdir > "${d}/set_event"
# Set tracer type.
echo function > "${d}/current_tracer"
# Filter only sys_mkdir as a workaround.
echo SyS_mkdir > "${d}/set_ftrace_filter"
# Start tracing.
echo 1 > "${d}/tracing_on"
# Generate two mkdir calls.
rm -rf /tmp/a
rm -rf /tmp/b
mkdir /tmp/a
mkdir /tmp/b
# View the trace.
cat "${d}/trace"
# Stop tracing.
echo 0 > "${d}/tracing_on"
umount debug
And then after running with sudo
it gives:
# tracer: function
#
# entries-in-buffer/entries-written: 4/4 #P:16
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
mkdir-31254 [015] .... 2010985.576760: sys_mkdir(pathname: 7ffc54b32c77, mode: 1ff)
mkdir-31254 [015] .... 2010985.576763: SyS_mkdir <-tracesys_phase2
mkdir-31255 [007] .... 2010985.578363: sys_mkdir(pathname: 7fff02d90c77, mode: 1ff)
mkdir-31255 [007] .... 2010985.578365: SyS_mkdir <-tracesys_phase2
My problem with this is that it output two lines for each syscall:
sys_mkdir
which is the event that I want
SyS_mkdir
which is the filtered function workaround, which I don't want to see
If I instead try to do:
echo > "${d}/set_ftrace_filter"
or don't touch that file at at all, then it shows a ton of functions and makes it hard to fint the syscall at all.
If there a nicer way to disable regular functions, and keep just syscall events?
I could use just SyS_mkdir
and disable the syscall event I guess, but it feels cleaner if I could use the more specific event? Also:
- the event shows arguments, which is nicer.
- syscall function names change across kernel versions. E.g., it is already
__x64_sys_mkdir
instead of SyS_mkdir
on Linux v4.18.
Related:
- How to trace the write system call in the Linux kernel?
Tested on Ubuntu 18.04, Linux kernel 4.15.
In addition, it's worth mention another concise way to gain such info. One can do something like:
stap -e 'probe syscall.mkdir { printf("%s[%d] -> %s(%s)\n", execname(), pid(), name, argstr) }'
The output:
systemd-journal[318] -> mkdir("/var/log/journal/c8d2562a041649cdbfd1ac5e24dbe0db", 0755)
systemd-journal[318] -> mkdir("/var/log/journal/c8d2562a041649cdbfd1ac5e24dbe0db", 0755)
mkdir[4870] -> mkdir("wtf", 0777)
...
Another way:
stap -e 'probe kernel.function("sys_mkdir") { printf("%s[%d] (%s)\n", execname(), pid(), $$parms) }'
The output:
systemd-journal[318] (pathname=0x55b74f7ab8b0 mode=0x1ed)
systemd-journal[318] (pathname=0x55b74f7ab8b0 mode=0x1ed)
mkdir[8532] (pathname=0x7ffcf30af761 mode=0x1ff)
...
You can customize the output as you like.
P.S. Systemtap is based on kprobes. Architecture doc will help to understand its internals.
More about the SystemTap.
Use the nop
tracer
As proposed by sruffell, all we have to do is to use the nop
tracer instead of function
, and that will disable the function traces, but not the events.
Run with sudo
:
#!/bin/sh
set -eux
d=debug/tracing
mkdir -p debug
if ! mountpoint -q debug; then
mount -t debugfs nodev debug
fi
# Stop tracing.
echo 0 > "${d}/tracing_on"
# Clear previous traces.
echo > "${d}/trace"
# Find the tracer name.
cat "${d}/available_tracers"
# Disable tracing functions, show only system call events.
echo nop > "${d}/current_tracer"
# Find the event name with.
grep mkdir "${d}/available_events"
# Enable tracing mkdir.
# Both statements below seem to do the exact same thing,
# just with different interfaces.
# https://www.kernel.org/doc/html/v4.18/trace/events.html
echo sys_enter_mkdir > "${d}/set_event"
# echo 1 > "${d}/events/syscalls/sys_enter_mkdir/enable"
# Start tracing.
echo 1 > "${d}/tracing_on"
# Generate two mkdir calls by two different processes.
rm -rf /tmp/a /tmp/b
mkdir /tmp/a
mkdir /tmp/b
# View the trace.
cat "${d}/trace"
# Stop tracing.
echo 0 > "${d}/tracing_on"
umount debug
This gives the desired output:
mkdir-26064 [007] .... 2014370.909743: sys_mkdir(pathname: 7fffbd461c77, mode: 1ff)
mkdir-26065 [014] .... 2014370.911615: sys_mkdir(pathname: 7ffea53bac77, mode: 1ff)
Alternative not best solution
This also works, but definitely less nice, replace:
echo SyS_mkdir > "${d}/set_ftrace_filter"
with:
echo '*' > "${d}/set_ftrace_notrace"
This turns off all functions as mentioned in the docs: https://www.kernel.org/doc/html/v4.18/trace/ftrace.html#the-file-system
set_ftrace_notrace:
This has an effect opposite to that of set_ftrace_filter. Any function that is added here will not be traced. If a function exists in both set_ftrace_filter and set_ftrace_notrace, the function will not be traced.