A colleague once told me that the last option when everything has failed to debug on Linux was to use strace.
I tried to learn the science behind this strange tool, but I am not a system admin guru and I didn’t really get results.
So,
- What is it exactly and what does it do?
- How and in which cases should it be used?
- How should the output be understood and processed?
In brief, in simple words, how does this stuff work?
strace is a good tool for learning how your program makes various system calls (requests to the kernel) and also reports the ones that have failed along with the error value associated with that failure. Not all failures are bugs. For example, a code that is trying to search for a file may get a ENOENT (No such file or directory) error but that may be an acceptable scenario in the logic of the code.
One good use case of using strace is to debug race conditions during temporary file creation. For example a program that may be creating files by appending the process ID (PID) to some predecided string may face problems in multi-threaded scenarios. [A PID+TID (process id + thread id) or a better system call such as mkstemp will fix this].
It is also good for debugging crashes. You may find this (my) article on strace and debugging crashes useful.
Strace stands out as a tool for investigating production systems where you can't afford to run these programs under a debugger. In particular, we have used strace in the following two situations:
For an example of analyzing using strace see my answer to this question.
strace lists all system calls done by the process it's applied to. If you don't know what system calls mean, you won't be able to get much mileage from it.
Nevertheless, if your problem involves files or paths or environment values, running strace on the problematic program and redirecting the output to a file and then grepping that file for your path/file/env string may help you see what your program is actually attempting to do, as distinct from what you expected it to.
strace -tfp PID will monitor the PID process's system calls, thus we can debug/monitor our process/program status.
Strace is a tool that tells you how your application interacts with your operating system.
It does this by telling you what OS system calls your application uses and with what parameters it calls them.
So for instance you see what files your program tries to open, and weather the call succeeds.
You can debug all sorts of problems with this tool. For instance if application says that it cannot find library that you know you have installed you strace would tell you where the application is looking for that file.
And that is just a tip of the iceberg.
I liked some of the answers where it reads
strace
checks how you interacts with your operating system.This is exactly what we can see. The system calls. If you compare
strace
andltrace
the difference is more obvious.On the other hand there is
ltrace
that traces functions.Although I checked the manuals several time, I haven't found the origin of the name
strace
but it is likely system-call trace, since this is obvious.There are three bigger notes to say about
strace
.Note 1: Both these functions
strace
andltrace
are using the system callptrace
. Soptrace
system call is effectively howstrace
works.Note 2: There are different parameters you can use with
strace
, sincestrace
can be very verbose. I like to experiment with-c
which is like a summary of things. Based on-c
you can select one system-call like-e trace=open
where you will see only that call. This can be interesting if you are examining what files will be opened during the command you are tracing. And of course, you can use thegrep
for the same purpose but note you need to redirect like this2>&1 | grep etc
to understand that config files are referenced when the command was issued.Note 3: I find this very important note. You are not limited to a specific architecture.
strace
will blow you mind, since it can trace over binaries of different architectures.