strace -c does not show correct seconds

2019-08-03 12:15发布

问题:

I have got results for strace -c on RHEL 7 and RHEL 6 for this command:

strace -c /bin/sleep 20

and I don't understand why the seconds column for nanosleep is equal to 0. I had expected it to be 20.

  0.00    0.000000           0         1           nanosleep

Here is a full strace report:

$ strace -c /bin/sleep 20 
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.000019           1        15        12 open
  0.00    0.000000           0         1           read
  0.00    0.000000           0         5           close
  0.00    0.000000           0         8         6 stat
  0.00    0.000000           0         3           fstat
  0.00    0.000000           0         9           mmap
  0.00    0.000000           0         3           mprotect
  0.00    0.000000           0         1           munmap
  0.00    0.000000           0         3           brk
  0.00    0.000000           0         1         1 access
  0.00    0.000000           0         1           nanosleep
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         1           arch_prctl
------ ----------- ----------- --------- --------- ----------------
100.00    0.000019                    52        19 total

And there is a call to nanosleep in a detailed report:

nanosleep({20, 0}, NULL)                = 0

So seconds must be 20, not 0. What do you think?

回答1:

From the manual page of strace(1):

-c On Linux, this attempts to show system time (CPU time spent running in the kernel)

I think that:

when a process calls nanosleep(), it asks the kernel to be suspended for a period of time. The kernel sets up a few things (like some flag, a timer, a timestamp...), suspends the calling process, and goes to do something else. strace(1) reports the time spent by the kernel to do this, not the time the process stays suspended.

May be that this -c strace option can be thought as "-cost": how much time this syscall costs?



回答2:

In order to understand this question I run strace for strace -c /bin/sleep:

This is how it looked:

$ strace -T -o syscalls.txt -v strace -c /bin/sleep 20
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 31.54    0.000429          29        15        12 open
 13.68    0.000186          23         8           mmap
 13.60    0.000185          46         4           mprotect
 11.62    0.000158          20         8         6 stat
  6.91    0.000094          19         5           close
  5.96    0.000081          81         1           munmap
  4.63    0.000063          16         4           brk
  3.38    0.000046          46         1           arch_prctl
  3.16    0.000043          43         1           nanosleep
  2.21    0.000030          30         1           read
  1.47    0.000020          20         1         1 access
  1.32    0.000018           6         3           fstat
  0.51    0.000007           7         1           execve
------ ----------- ----------- --------- --------- ----------------
100.00    0.001360                    53        19 total

Below is some lines from sycalls.txt related to the nanosleep syscall:

ptrace(PTRACE_SYSCALL, 6498, 0, SIG_0)  = 0 <0.000028>
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000017>
wait4(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == 133}], __WALL, {ru_utime={0, 0}, ru_stime={0, 3706}, ru_maxrss=616, ru_ixrss=0, ru_idrss=0, ru_isrss=0, ru_minflt=205, ru_majflt=0, ru_nswap=0, ru_inblock=0, ru_oublock=0, ru_msgsnd=0, ru_msgrcv=0, ru_nsignals=0, ru_nvcsw=108, ru_nivcsw=1}) = 6498 <20.000423>
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_TRAPPED, si_pid=6498, si_status=SIGTRAP, si_utime=0, si_stime=0} ---

So, nanosleep itself lasts for 20 seconds as it is shown in the end of the line: <20.000423>. However wait4 returns for it:

{ru_utime={0, 0}, ru_stime={0, 3706} 

So it takes 3 microseconds to do nanosleep according to the report. So the seconds column is likely to mean (user_time + sys_time + some-unclear-overhead) spent by OS to handle a syscall. It is not meant to be wall time for a syscall.



标签: linux strace