In a script you must include a #!
on the first line followed by the path to the program that will execute the script (e.g.: sh, perl).
As far as I know, the #
character denotes the start of a comment and that line is supposed to be ignored by the program executing the script. It would seem, that this first line is at some point read by something in order for the script to be executed by the proper program.
Could somebody please shed more light on the workings of the #!
?
I'm really curious about this, so the more in-depth the answer the better.
Recommended reading:
- The UNIX FAQ: Why do some scripts start with #! ... ?
- The #! magic, details about the shebang/hash-bang mechanism on various Unix flavours
- Wikipedia: Shebang
The unix kernel's program loader is responsible for doing this. When exec()
is called, it asks the kernel to load the program from the file at its argument. It will then check the first 16 bits of the file to see what executable format it has. If it finds that these bits are #!
it will use the rest of the first line of the file to find which program it should launch, and it provides the name of the file it was trying to launch (the script) as the last argument to the interpreter program.
The interpreter then runs as normal, and treats the #!
as a comment line.
Short story: The shebang (#!
) line is read by the shell (e.g. sh
, bash
, etc.) the operating system's program loader. While it formally looks like a comment, the fact that it's the very first two bytes of a file marks the whole file as a text file and as a script. The script will be passed to the executable mentioned on the first line after the shebang. Voilà!
Slightly longer story: Imagine you have your script, foo.sh
, with the executable bit (x
) set. This file contains e.g. the following:
#!/bin/sh
# some script commands follow...:
# *snip*
Now, on your shell, you type:
> ./foo.sh
Edit: Please also read the comments below after or before you read the following! As it turns out, I was mistaken. It's apparently not the shell that passes the script to the target interpreter, but the operating system (kernel) itself.
Remember that you type this inside the shell process (let's assume this is the program /bin/sh
). Therefore, that input will have to be processed by that program. It interprets this line as a command, since it discovers that the very first thing entered on the line is the name of a file that actually exists and which has the executable bit(s) set.
/bin/sh
then starts reading the file's contents and discovers the shebang (#!
) right at the very beginning of the file. To the shell, this is a token ("magic number") by which it knows that the file contains a script.
Now, how does it know which programming language the script is written it? After all, you can execute Bash scripts, Perl scripts, Python scripts, ... All the shell knows so far is that it is looking at a script file (which is not a binary file, but a text file). Thus it reads the next input up to the first line break (which will result in /bin/sh
, compare with the above). This is the interpreter to which the script will be passed for execution. (In this particular case, the target interpreter is the shell itself, so it doesn't have to invoke a new shell for the script; it simply processes the rest of the script file itself.)
If the script was destined for e.g. /bin/perl
, all that the Perl interpreter would (optionally) have to do is look whether the shebang line really mentions the Perl interpreter. If not, the Perl interpreter would know that it cannot execute this script. If indeed the Perl interpreter is mentioned in the shebang line, it reads the rest of the script file and executes it.
The Linux kernel exec
system call uses the initial bytes #!
to identify file type
When you do on bash:
./something
on Linux, this calls the exec
system call with the path ./something
.
This line gets called in the kernel on the file passed to exec
: https://github.com/torvalds/linux/blob/v4.8/fs/binfmt_script.c#L25
if ((bprm->buf[0] != '#') || (bprm->buf[1] != '!'))
It reads the very first bytes of the file, and compares them to #!
.
If the comparison is true, then the rest of the line is parsed by the Linux kernel, which makes another exec call with path /usr/bin/env python
and current file as the first argument:
/usr/bin/env python /path/to/script.py
and this works for any scripting language that uses #
as a comment character.
And yes, you can make an infinite loop with:
printf '#!/a\n' | sudo tee /a
sudo chmod +x /a
/a
Bash recognizes the error:
-bash: /a: /a: bad interpreter: Too many levels of symbolic links
#!
is human readable, but that is not necessary.
Had the file started with different bytes, then the exec
system call would use a different handler. The other most important built-in handler is for ELF executable files: https://github.com/torvalds/linux/blob/v4.8/fs/binfmt_elf.c#L1305 which checks for bytes 7f 45 4c 46
(which also happens to be human readable for .ELF
), which reads the elf file, puts it into memory correctly, and starts a new process with it. See also: How does kernel get an executable binary file running under linux?
Finally, you can add your own shebang handlers with the binfmt_misc
mechanism. For example, you can add a custom handler for .jar
files. This mechanism even supports handlers by file extension. Another application is to transparently run executables of a different architecture with QEMU.
I don't think POSIX specifies shebangs however: https://unix.stackexchange.com/a/346214/32558