When I use the function getenv()
from the Standard C Library, my program inherit the environment variables from its parent.
Example:
$ export FOO=42
$ <<< 'int main() {printf("%s\n", getenv("FOO"));}' gcc -w -xc - && ./a.exe
42
In libc, the environ
variable is declared into environ.c
. I am expecting it to be empty at the execution, but I get 42
.
Going a bit further getenv
can be simplified as follow:
char * getenv (const char *name)
{
size_t len = strlen (name);
char **ep;
uint16_t name_start;
name_start = *(const uint16_t *) name;
len -= 2;
name += 2;
for (ep = __environ; *ep != NULL; ++ep)
{
uint16_t ep_start = *(uint16_t *) *ep;
if (name_start == ep_start && !strncmp (*ep + 2, name, len)
&& (*ep)[len + 2] == '=')
return &(*ep)[len + 3];
}
return NULL;
}
libc_hidden_def (getenv)
Here I will just get the content of the __environ
variable. However I never initialized it.
So I get confused because environ
is supposed to be NULL
unless my main function is not the real entry point of my program. Perhaps gcc
is ticking me by adding an _init
function that is part of the standard C library.
Where is environ
initialized?
There is no mystery here.
First, the shell forks. Forked process obviously has the same environment. Then a new program is executed in the child. The syscall in question is
execve
, which amongst other things accepts a pointer to an environment.So there, what environment is set after execing a binary depends entirely on the code which was doing the exec.
All this is can be easily seen by running strace.
EDIT: since the question was edited to ask about
environ
:When you execute a dynamically linked binary, the very first userspace code doing anything comes from the loader. The loader amongst other things sets up variables like
argc
,argv
orenviron
and only then callsmain()
from the binary.Once more, sources for all this are freely available. While glibc's sources are rather hard to read due to atrocious formatting, BSD ones are easy and conceptually equivalent enough.
http://code.metager.de/source/xref/freebsd/libexec/rtld-elf/rtld.c#389
The father process that calls your program (your shell) defines FOO. The newly created process receives a copy from the parent.
The environment variables are passed down from the parent process as a third argument to
main
. The easiest way to discover this is to read the documentation for the system callexecve
, particularly this bit:The C library copies the
envp
argument into theenviron
global variable somewhere in its startup code, before it callsmain
: for instance, GNU libc does this in_init
and musl libc does it in__init_libc
. (You may find musl libc's code easier to trace through than GNU libc's.) Conversely, if you start a program using one of theexec
wrapper functions that don't take an explicit environment vector, the C library suppliesenviron
as the third argument toexecve
. Inheritance of environment variables is thus strictly a user-space convention. As far as the kernel is concerned, each program receives two argument vectors, and it doesn't care what's in them.(Note that three-argument
main
is an extension to the C language. The C standard only specifiesint main(void)
andint main(int argc, char **argv)
but it permits implementations to define additional forms (C11 Annex J.5.1 Environment Arguments). The three-argumentmain
has been how environment variables work since Unix V7 if not longer, and is documented by Microsoft too — see What shouldmain()
return in C and C++?.)Under Linux when a program starts it has its arguments and environmental variables stored on the stack. For C programs the code that executes before
main
looks at this, makes theargv
andenvp
arrays of pointers, and then callsmain
with these values (andargc
).When a program calls
execvpe
to turn into a new program (often after callingfork
) then anenvp
is passed in, along with anargv
. The kernel will copy the data at these into the new program's stack.When any of the other
exec
functions are called then the glibc will pass in the current program'senviron
as the new program'senvp
toexecvpe
(or directly to sys_exec).The question is really, How does the shell run commands?
The answer is by creating a new process probably using
fork()
andexecl()
, which creates a process with the same environment as the current process.You can however create a new process with a custom environment using
execvpe()
/execle()
.But in any normal situation that wouldn't be necessary, and specially since many programs expect some environment variables to be defined like
PATH
for example, normally a child process inherits the environment variables from the environment where it is invoked.