A running bash script is hung somewhere. Can I fin

2020-05-23 08:56发布

问题:

E.g. does the bash debugger support attaching to existing processes and examining the current state?

Or can I easily find out by looking at the bash process entries in /proc? Is there a convenient tool to give line numbers in active files?

I don't want to have to kill and restart the process.

This is on Linux - Ubuntu 10.04.

回答1:

I recently found myself in a similar position. I had a shell script that was not identifiable through other means (such as arguments, etc.)

There are ways to find out a lot more about a running process than you would expect.

Use lsof -p $pid to see what files are open, which may give you some clues. Note that some files, while "deleted", can still be kept open by the script. As long as the script doesn't close the file, it can still read and write from it - and the file still takes up room on the file system.

Use strace to actively trace the system calls used by the script. The script will read the script file, so you can see some of the commands as they are read prior to execution. Look for read commands with this command:

strace -p $pid -s 1024

This makes the commands print strings up to 1024 characters long (normally, the strace command would truncate strings much shorter than that).

Examine the directory /proc/$pid in order to see details about the script; in particular note, see /proc/$pid/environ which will give you the process environment separated by nulls. To read this "file" properly, use this command:

xargs -0 -i{} < /proc/$pid/environ

You can pipe that into less or save it in a file. There is also /proc/$pid/cmdline but it is possible that that will only give you the shell name (-bash for instance).



回答2:

No real solution. But in most cases a script is waiting for a child process to terminate:

ps --ppid  $(pidof yourscript)

You could also setup signal handlers in you shell skript do toggle the printing of commands:

#!/bin/bash

trap "set -x" SIGUSR1
trap "set +x" SIGUSR2

while true; do
    sleep 1
done

Then use

kill -USR1 $(pidof yourscript)
kill -USR2 $(pidof yourscript)


回答3:

Use pstree to show what linux command/executable your script is calling. For example, 21156 is the pid of my hanging script:

ocfs2cts1:~ # pstree -pl 21156
activate_discon(21156)───mpirun(15146)─┬─fillup_contig_b(15149)───sudo(15231)───chmod(15232)
                                       ├─ssh(15148)
                                       └─{mpirun}(15147)

So that, I know it's hanging at chmod command. Then, show the stack trace by:

ocfs2cts1:~ # cat /proc/15232/stack 
[<ffffffffa05377ef>] __ocfs2_cluster_lock.isra.39+0x1bf/0x620 [ocfs2]
[<ffffffffa053856d>] ocfs2_inode_lock_full_nested+0x12d/0x840 [ocfs2]
[<ffffffffa0538dbb>] ocfs2_inode_lock_atime+0xcb/0x170 [ocfs2]
[<ffffffffa0531e61>] ocfs2_readdir+0x41/0x1b0 [ocfs2]
[<ffffffff8120d03c>] iterate_dir+0x9c/0x110
[<ffffffff8120d453>] SyS_getdents+0x83/0xf0
[<ffffffff815e126e>] entry_SYSCALL_64_fastpath+0x12/0x6d
[<ffffffffffffffff>] 0xffffffffffffffff

Oh, boy, it's likely a deadlock bug...