I'm looking for a way of processing a shell script to determine:
- which commands, scripts or functions are called in the script.
- which files are accessed by the script (r or w) .
It doesn't need to recurse down through the dependencies, just list what it runs directly. I could probably write something that does this myself but it must have been done before ... I'm just not finding it.
You can use 'strace' to run a script and see everything the script and its subprocesses do, including looking for and opening files. For example:
$ cat foo.sh
#!/usr/bin/env bash
touch /tmp/foon
$ chmod +x foo.sh
$ strace -f -e execve,access,open,stat -o foo.trace ./foo.sh
$ cat foo.trace
32176 execve("./foo.sh", ["./foo.sh"], [/* 42 vars */]) = 0
32176 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
32176 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
32176 open("/usr/local/lib/tls/x86_64/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
...
32176 execve("/bin/bash", ["bash", "./foo.sh"], [/* 42 vars */]) = 0
...
32177 execve("/usr/bin/touch", ["touch", "/tmp/foon"], [/* 41 vars */]) = 0
32177 open("/tmp/foon", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = 3
...
32176 --- SIGCHLD (Child exited) @ 0 (0) ---
$
I've trimmed a lot of the other activity going on there (opening system libraries; looking up locale data; and much more). Check out 'man strace' for details on what the options mean; -f, -o, and -e are the ones I use most often.
It will be a feature of the Loker project which I currently develop. For now, the parser is almost complete and you may implement a reasonable approximation of what you want on top of it. However, in general this task is very complex, because the name of the command may result from variable expansion, field splitting etc.
If you describe what do you need this for and what kind of scripts are you going to parse I will be able to say how much of your needs Loker can satisfy by now.
As alternative option, some versions of bash have --rpm-requires
option, which also does something similar.
You just cannot do that in such a dynamic language, your static analysis tool will not be reliable and miss a number of dependencies. Consider the following code:
#!/bin/sh
func_foo() { echo 'foo running' ; }
func_bar() { echo 'bar running' ; }
# etc...
printf 'foo, bar,...? ' # for testing
read RPC
# rpc_is_in_rpclist "$RPC" || die "invalid call"
printf 'Calling func_%s\n' "$RPC"
func_"$RPC"
This is not a convoluted example; I recently added to a production environment a more elaborated version of this with parameters.
If you really need static analysis then you should not be using a dynamic language in the first place, they are simply incompatible with each other. The same holds for every other functional language where functions are passed as arguments: static analysis cannot practically predict the values of arguments.
Even in a very static and simple language like Java you can still dodge static analysis if you try hard and you use reflection. However reflection is cumbersome by design and not in wide use, so analysis of Java is very useful in practice: see Eclipse etc..