While I was working on this question, I've come across a possible idea that uses ptrace
, but I'm unable to get a proper understanding of how ptrace
interacts with threads.
Suppose I have a given, multithreaded main process, and I want to attach to a specific thread in it (perhaps from a forked child).
Can I attach to a specific thread? (The manuals diverge on this question.)
If so, does that mean that single-stepping only steps through that one thread's instructions? Does it stop all the process's threads?
If so, do all the other threads remain stopped while I call
PTRACE_SYSCALL
orPTRACE_SINGLESTEP
, or do all threads continue? Is there a way to step forward only in one single thread but guarantee that the other threads remain stopped?
Basically, I want to synchronise the original program by forcing all threads to stop, and then only execute a small set of single-threaded instructions by single-stepping the one traced thread.
My personal attempts so far look a bit like this:
pid_t target = syscall(SYS_gettid); // get the calling thread's ID
pid_t pid = fork();
if (pid > 0)
{
waitpid(pid, NULL, 0); // synchronise main process
important_instruction();
}
else if (pid == 0)
{
ptrace(target, PTRACE_ATTACH, NULL, NULL); // does this work?
// cancel parent's "waitpid" call, e.g. with a signal
// single-step to execute "important_instruction()" above
ptrace(target, PTRACE_DETACH, NULL, NULL); // parent's threads resume?
_Exit(0);
}
However, I'm not sure, and can't find suitable references, that this is concurrently-correct and that important_instruction()
is guaranteed to be executed only when all other threads are stopped. I also understand that there may be race conditions when the parent receives signals from elsewhere, and I heard that I should use PTRACE_SEIZE
instead, but that doesn't seem to exist everywhere.
Any clarification or references would be greatly appreciated!
Yes, at least on current kernels.
Yes. It does not stop the other threads, only the attached one.
Yes. Send
SIGSTOP
to the process (usewaitpid(PID,,WUNTRACED)
to wait for the process to be stopped), thenPTRACE_ATTACH
to every thread in the process. SendSIGCONT
(usingwaitpid(PID,,WCONTINUED)
to wait for the process to continue).Since all threads were stopped when you attached, and attaching stops the thread, all threads stay stopped after the
SIGCONT
signal is delivered. You can single-step the threads in any order you prefer.I found this interesting enough to whip up a test case. (Okay, actually I suspect nobody will take my word for it anyway, so I decided it's better to show proof you can duplicate on your own instead.)
My system seems to follow the
man 2 ptrace
as described in the Linux man-pages project, and Kerrisk seems to be pretty good at maintaining them in sync with kernel behaviour. In general, I much prefer kernel.org sources wrt. the Linux kernel to other sources.Summary:
Attaching to the process itself (TID==PID) stops only the original thread, not all threads.
Attaching to a specific thread (using TIDs from
/proc/PID/task/
) does stop that thread. (In other words, the thread with TID == PID is not special.)Sending a
SIGSTOP
to the process will stop all threads, butptrace()
still works absolutely fine.If you sent a
SIGSTOP
to the process, do not callptrace(PTRACE_CONT, TID)
before detaching.PTRACE_CONT
seems to interfere with theSIGCONT
signal.You can first send a
SIGSTOP
, thenPTRACE_ATTACH
, then sendSIGCONT
, without any issues; the thread will stay stopped (due to the ptrace). In other words,PTRACE_ATTACH
andPTRACE_DETACH
mix well withSIGSTOP
andSIGCONT
, without any side effects I could see.SIGSTOP
andSIGCONT
affect the entire process, even if you try usingtgkill()
(orpthread_kill()
) to send the signal to a specific thread.To stop and continue a specific thread,
PTHREAD_ATTACH
it; to stop and continue all threads of a process, sendSIGSTOP
andSIGCONT
signals to the process, respectively.Personally, I believe this validates the approach I suggested in that another question.
Here is the ugly test code you can compile and run to test it for yourself,
traces.c
:Compile and run using e.g.
The output is a dump of the child process counters (each one incremented in a separate thread, including the original thread which uses the final counter). Compare the counters across the short wait. For example:
As you can see above, only the initial thread (whose TID == PID), which uses the final counter, is stopped. The same happens for the other three threads, too, which use the first three counters in order:
The next two cases examine the
SIGCONT
/SIGSTOP
wrt. the entire process:As you can see, sending
SIGSTOP
will stop all threads, but not hinder withptrace()
. Similarly, afterSIGCONT
, the threads continue running as normal.The final two cases examine the effects of using
tgkill()
to send theSIGSTOP
/SIGCONT
to a specific thread (the one that corresponds to the first counter), while attaching to another thread:Unfortunately, but as expected, the disposition (stopped/running) is process-wide, not thread-specific, as you can see above. This means that to stop a specific threads and let the other threads run normally, you need to separately
PTHREAD_ATTACH
to the threads you wish to stop.To prove all my statements above, you may have to add test cases; I ended up having quite a few copies of the code, all slightly edited, to test it all, and I'm not sure I picked the most complete set. I'd be happy to expand the test program, if you find omissions.
Questions?
I wrote a second test case. I had to add a separate answer, since it was too long to fit into the first one with example output included.
First, here is
tracer.c
:tracer.c
executes the specified command, waiting for the command to receive aSIGSTOP
signal. (tracer.c
does not send it itself; you can either have the tracee stop itself, or send the signal externally.)When the command has stopped,
tracer.c
attaches a ptrace to every thread, and single-steps one of the threads a fixed number of steps (SINGLESTEPS
compile-time constant), showing the pertinent register state for each thread.After that, it detaches from the command, and sends it a
SIGCONT
signal to let it continue its operation normally.Here is a simple test program,
worker.c
, I used for testing:Compile both using e.g.
and run either in a separate terminal, or on the background, using e.g.
The tracer shows the PID of the worker:
At this point, the child is running normally. The action starts when you send a
SIGSTOP
to the child. The tracer detects it, does the desired tracing, then detaches and lets the child continue normally:You can repeat the above as many times as you wish. Note that I picked the
SIGSTOP
signal as the trigger, because this waytracer.c
is also useful as a basis for generating complex multithreaded core dumps per request (as the multithreaded process can simply trigger it by sending itself aSIGSTOP
).The disassembly of the
worker()
function the threads are all spinning in the above example:Now, this test program does only show how to stop a process, attach to all of its threads, single-step one of the threads a desired number of instructions, then letting all the threads continue normally; it does not yet prove that the same applies for letting specific threads continue normally (via
PTRACE_CONT
). However, the detail I describe below indicates, to me, that the same approach should work fine forPTRACE_CONT
.The main problem or surprise I encountered while writing the above test programs was the necessity of the
loop, especially for the
ESRCH
case (the others I only added due to the ptrace man page description).You see, most ptrace commands are only allowed when the task is stopped. However, the task is not stopped when it is still completing e.g. a single-step command. Thus, using the above loop -- perhaps adding a millisecond nanosleep or similar to avoid wasting CPU -- makes sure the previous ptrace command has completed (and thus the task stopped) before we try to supply the new one.
Kerrek SB, I do believe at least some of the troubles you've had with your test programs are due to this issue? To me, personally, it was a kind of a D'oh! moment to realize that of course this is necessary, as ptracing is inherently asynchronous, not synchronous.
(This asynchronicity is also the cause for the
SIGCONT
-PTRACE_CONT
interaction I mentioned above. I do believe with proper handling using the loop shown above, that interaction is no longer a problem -- and is actually quite understandable.)Adding to the comments to this answer:
The Linux kernel uses a set of task state flags in the task_struct structure (see
include/linux/sched.h
for definition) to keep track of the state of each task. The userspace-facing side ofptrace()
is defined inkernel/ptrace.c
.When
PTRACE_SINGLESTEP
orPTRACE_CONT
is called,kernel/ptrace.c
:ptrace_continue()
handles most of the details. It finishes by callingwake_up_state(child, __TASK_TRACED)
(kernel/sched/core.c::try_to_wake_up(child, __TASK_TRACED, 0)
).When a process is stopped via
SIGSTOP
signal, all tasks will be stopped, and end up in the "stopped, not traced" state.Attaching to every task (via PTRACE_ATTACH or PTRACE_SEIZE, see
kernel/ptrace.c
:ptrace_attach()
) modifies the task state. However, ptrace state bits (seeinclude/linux/ptrace.h:PT_
constants) are separate from the task runnable state bits (seeinclude/linux/sched.h:TASK_
constants).After attaching to the tasks, and sending the process a
SIGCONT
signal, the stopped state is not immediately modified (I believe), since the task is also being traced. Doing PTRACE_SINGLESTEP or PTRACE_CONT ends up inkernel/sched/core.c::try_to_wake_up(child, __TASK_TRACED, 0)
, which updates the task state, and moves the task to the run queue.Now, the complicated part that I haven't yet found the code path, is how the task state gets updated in the kernel when the task is next scheduled. My tests indicate that with single-stepping (which is yet another task state flag), only the task state gets updated, with the single-step flag cleared. It seems that PTRACE_CONT is not as reliable; I believe it is because the single-step flag "forces" that task state change. Perhaps there is a "race condition" wrt. the continue signal delivery and state change?
(Further edit: the kernel developers definitely expect
wait()
to be called, see for example this thread.)In other words, after noticing that the process has stopped (note that you can use
/proc/PID/stat
or/proc/PID/status
if the process is not a child, and not yet attached to), I believe the following procedure is the most robust one:After the above, all tasks should be attached and in the expected state, so that e.g. PTRACE_CONT works without further tricks.
If the behaviour changes in future kernels -- I do believe the interaction between the STOP/CONT signals and ptracing is something that might change; at least a question to the LKML developers about this behaviour would be warranted! --, the above procedure will still work robustly. (Erring on the side of caution, by using a loop to PTRACE_SINGLESTEP a few times, might also be a good idea.)
The difference to PTRACE_CONT is that if the behaviour changes in the future, the initial PTRACE_CONT might actually continue the process, causing the
ptrace()
that follow it to fail. With PTRACE_SINGLESTEP, the process will stop, allowing furtherptrace()
calls to succeed.Questions?
Each thread in the process is traced individually (and each can be potentially traced by a different tracing process, or be untraced). When you call ptrace attach, you are always attaching to just a single thread. Only that thread will be stopped - the other threads will continue running as they were.
Recent versions of the
ptrace()
man page make this very clear:Single-stepping affects only the thread that you direct it at. If the other threads are running they continue running, and if they are in tracing stop they stay in tracing stop. (This means that if the thread you are single-stepping tries to acquire a mutex or similar synchronisation resource that is held by another non-running thread, it will not be able to acquire that mutex).
If you want to stop all the threads of the process while you single-step one thread, you will need to attach to all of the threads. There is the added complication that if the process is running while you're trying to attach to it, new threads could be created while you're enumerating them.
Yes It traces the process, all threads of this process are stop. Imagine it it wasn't how could you see the dirfferent thread in your IDE.
from the manual:
Example code to attach:
So yes you are atached to a thread and yes it stops all the threads of the process.
maybe see here : http://www.secretmango.com/jimb/Whitepapers/ptrace/ptrace.html