How to trace just system call events with ftrace w

2019-01-19 07:44发布

问题:

For example, to monitor all mkdir calls made, the best I could come up with was:

#!/bin/sh
set -eux

d=debug/tracing

mkdir -p debug
if ! mountpoint -q debug; then
  mount -t debugfs nodev debug
fi

# Stop tracing.
echo 0 > "${d}/tracing_on"

# Clear previous traces.
echo > "${d}/trace"

# Enable tracing mkdir
echo sys_enter_mkdir > "${d}/set_event"

# Set tracer type.
echo function > "${d}/current_tracer"

# Filter only sys_mkdir as a workaround.
echo SyS_mkdir > "${d}/set_ftrace_filter"

# Start tracing.
echo 1 > "${d}/tracing_on"

# Generate two mkdir calls.
rm -rf /tmp/a
rm -rf /tmp/b
mkdir /tmp/a
mkdir /tmp/b

# View the trace.
cat "${d}/trace"

# Stop tracing.
echo 0 > "${d}/tracing_on"

umount debug

And then after running with sudo it gives:

# tracer: function
#
# entries-in-buffer/entries-written: 4/4   #P:16
#
#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
           mkdir-31254 [015] .... 2010985.576760: sys_mkdir(pathname: 7ffc54b32c77, mode: 1ff)
           mkdir-31254 [015] .... 2010985.576763: SyS_mkdir <-tracesys_phase2
           mkdir-31255 [007] .... 2010985.578363: sys_mkdir(pathname: 7fff02d90c77, mode: 1ff)
           mkdir-31255 [007] .... 2010985.578365: SyS_mkdir <-tracesys_phase2

My problem with this is that it output two lines for each syscall:

  • sys_mkdir which is the event that I want
  • SyS_mkdir which is the filtered function workaround, which I don't want to see

If I instead try to do:

echo > "${d}/set_ftrace_filter"

or don't touch that file at at all, then it shows a ton of functions and makes it hard to fint the syscall at all.

If there a nicer way to disable regular functions, and keep just syscall events?

I could use just SyS_mkdir and disable the syscall event I guess, but it feels cleaner if I could use the more specific event? Also:

  • the event shows arguments, which is nicer.
  • syscall function names change across kernel versions. E.g., it is already __x64_sys_mkdir instead of SyS_mkdir on Linux v4.18.

Related:

  • How to trace the write system call in the Linux kernel?

Tested on Ubuntu 18.04, Linux kernel 4.15.

回答1:

In addition, it's worth mention another concise way to gain such info. One can do something like:

stap -e 'probe syscall.mkdir { printf("%s[%d] -> %s(%s)\n", execname(), pid(), name, argstr) }'

The output:

systemd-journal[318] -> mkdir("/var/log/journal/c8d2562a041649cdbfd1ac5e24dbe0db", 0755)
systemd-journal[318] -> mkdir("/var/log/journal/c8d2562a041649cdbfd1ac5e24dbe0db", 0755)
mkdir[4870] -> mkdir("wtf", 0777)
...

Another way:

stap -e 'probe kernel.function("sys_mkdir") { printf("%s[%d] (%s)\n", execname(), pid(), $$parms) }'

The output:

systemd-journal[318] (pathname=0x55b74f7ab8b0 mode=0x1ed)
systemd-journal[318] (pathname=0x55b74f7ab8b0 mode=0x1ed)
mkdir[8532] (pathname=0x7ffcf30af761 mode=0x1ff)
...

You can customize the output as you like.

P.S. Systemtap is based on kprobes. Architecture doc will help to understand its internals.

More about the SystemTap.



回答2:

Use the nop tracer

As proposed by sruffell, all we have to do is to use the nop tracer instead of function, and that will disable the function traces, but not the events.

Run with sudo:

#!/bin/sh
set -eux

d=debug/tracing

mkdir -p debug
if ! mountpoint -q debug; then
  mount -t debugfs nodev debug
fi

# Stop tracing.
echo 0 > "${d}/tracing_on"

# Clear previous traces.
echo > "${d}/trace"

# Find the tracer name.
cat "${d}/available_tracers"

# Disable tracing functions, show only system call events.
echo nop > "${d}/current_tracer"

# Find the event name with.
grep mkdir "${d}/available_events"

# Enable tracing mkdir.
# Both statements below seem to do the exact same thing,
# just with different interfaces.
# https://www.kernel.org/doc/html/v4.18/trace/events.html
echo sys_enter_mkdir > "${d}/set_event"
# echo 1 > "${d}/events/syscalls/sys_enter_mkdir/enable"

# Start tracing.
echo 1 > "${d}/tracing_on"

# Generate two mkdir calls by two different processes.
rm -rf /tmp/a /tmp/b
mkdir /tmp/a
mkdir /tmp/b

# View the trace.
cat "${d}/trace"

# Stop tracing.
echo 0 > "${d}/tracing_on"

umount debug

This gives the desired output:

   mkdir-26064 [007] .... 2014370.909743: sys_mkdir(pathname: 7fffbd461c77, mode: 1ff)
   mkdir-26065 [014] .... 2014370.911615: sys_mkdir(pathname: 7ffea53bac77, mode: 1ff)

Alternative not best solution

This also works, but definitely less nice, replace:

echo SyS_mkdir > "${d}/set_ftrace_filter"

with:

echo '*' > "${d}/set_ftrace_notrace"

This turns off all functions as mentioned in the docs: https://www.kernel.org/doc/html/v4.18/trace/ftrace.html#the-file-system

set_ftrace_notrace:

This has an effect opposite to that of set_ftrace_filter. Any function that is added here will not be traced. If a function exists in both set_ftrace_filter and set_ftrace_notrace, the function will not be traced.