It seems to me that Linux has it easy with /proc/self/exe. But I'd like to know if there is a convenient way to find the current application's directory in C/C++ with cross-platform interfaces. I've seen some projects mucking around with argv[0], but it doesn't seem entirely reliable.
If you ever had to support, say, Mac OS X, which doesn't have /proc/, what would you have done? Use #ifdefs to isolate the platform-specific code (NSBundle, for example)? Or try to deduce the executable's path from argv[0], $PATH and whatnot, risking finding bugs in edge cases?
Making this work reliably across platforms requires using #ifdef statements.
The below code finds the executable's path in Windows, Linux, MacOS, Solaris or FreeBSD (although FreeBSD is untested). It uses boost>=1.55.0 to simplify the code but it's easy enough to remove if you want. Just use defines like _MSC_VER and __linux as the OS and compiler require.
The above version returns full paths including the executable name. If instead you want the path without the executable name,
#include boost/filesystem.hpp>
and change the return statement to:AFAIK, no such way. And there is also an ambuiguity: what would you like to get as the answer if the same executable has multiple hard-links "pointing" to it? (Hard-links don't actually "point", they are the same file, just at another place in the FS hierarchy.) Once execve() successfully executes a new binary, all information about its arguments is lost.
Check out the whereami library from Gregory Pakosz (which has just a single C file); it allows you to get the full path to the current executable on a variety of platforms. Currently, it's available as a repo on github here.
More portable way to get path name of executable image:
ps can give you the path of the executable, given you have the process id. Also ps is a POSIX utility so it should be portable
so if process id is 249297 then this command gives you the path name only.
Explanation of arguments
-p - selects given process
-o comm - displays the command name ( -o cmd selects the whole command line)
--no-heading - do not display a heading line, just the output.
A C program can run this via popen.
An alternative on Linux to using either
/proc/self/exe
orargv[0]
is using the information passed by the ELF interpreter, made available by glibc as such:Note that
getauxval
is a glibc extension, and to be robust you should check so that it doesn't returnNULL
(indicating that the ELF interpreter hasn't provided theAT_EXECFN
parameter), but I don't think this is ever actually a problem on Linux.The use of
/proc/self/exe
is non-portable and unreliable. On my Ubuntu 12.04 system, you must be root to read/follow the symlink. This will make the Boost example and probably thewhereami()
solutions posted fail.This post is very long but discusses the actual issues and presents code which actually works along with validation against a test suite.
The best way to find your program is to retrace the same steps the system uses. This is done by using
argv[0]
resolved against file system root, pwd, path environment and considering symlinks, and pathname canonicalization. This is from memory but I have done this in the past successfully and tested it in a variety of different situations. It is not guaranteed to work, but if it doesn't you probably have much bigger problems and it is more reliable overall than any of the other methods discussed. There are situations on a Unix compatible system in which proper handling ofargv[0]
will not get you to your program but then you are executing in a certifiably broken environment. It is also fairly portable to all Unix derived systems since around 1970 and even some non-Unix derived systems as it basically relies on libc() standard functionality and standard command line functionality. It should work on Linux (all versions), Android, Chrome OS, Minix, original Bell Labs Unix, FreeBSD, NetBSD, OpenBSD, BSD x.x, SunOS, Solaris, SYSV, HPUX, Concentrix, SCO, Darwin, AIX, OS X, Nextstep, etc. And with a little modification probably VMS, VM/CMS, DOS/Windows, ReactOS, OS/2, etc. If a program was launched directly from a GUI environment, it should have setargv[0]
to an absolute path.Understand that almost every shell on every Unix compatible operating system that has ever been released basically finds programs the same way and sets up the operating environment almost the same way (with some optional extras). And any other program that launches a program is expected to create the same environment (argv, environment strings, etc.) for that program as if it were run from a shell, with some optional extras. A program or user can setup an environment that deviates from this convention for other subordinate programs that it launches but if it does, this is a bug and the program has no reasonable expectation that the subordinate program or its subordinates will function correctly.
Possible values of
argv[0]
include:/path/to/executable
— absolute path../bin/executable
— relative to pwdbin/executable
— relative to pwd./foo
— relative to pwdexecutable
— basename, find in pathbin//executable
— relative to pwd, non-canonicalsrc/../bin/executable
— relative to pwd, non-canonical, backtrackingbin/./echoargc
— relative to pwd, non-canonicalValues you should not see:
~/bin/executable
— rewritten before your program runs.~user/bin/executable
— rewritten before your program runsalias
— rewritten before your program runs$shellvariable
— rewritten before your program runs*foo*
— wildcard, rewritten before your program runs, not very useful?foo?
— wildcard, rewritten before your program runs, not very usefulIn addition, these may contain non-canonical path names and multiple layers of symbolic links. In some cases, there may be multiple hard links to the same program. For example,
/bin/ls
,/bin/ps
,/bin/chmod
,/bin/rm
, etc. may be hard links to/bin/busybox
.To find yourself, follow the steps below:
Save pwd, PATH, and argv[0] on entry to your program (or initialization of your library) as they may change later.
Optional: particularly for non-Unix systems, separate out but don't discard the pathname host/user/drive prefix part, if present; the part which often precedes a colon or follows an initial "//".
If
argv[0]
is an absolute path, use that as a starting point. An absolute path probably starts with "/" but on some non-Unix systems it might start with "\" or a drive letter or name prefix followed by a colon.Else if
argv[0]
is a relative path (contains "/" or "\" but doesn't start with it, such as "../../bin/foo", then combine pwd+"/"+argv[0] (use present working directory from when program started, not current).Else if argv[0] is a plain basename (no slashes), then combine it with each entry in PATH environment variable in turn and try those and use the first one which succeeds.
Optional: Else try the very platform specific
/proc/self/exe
,/proc/curproc/file
(BSD), and(char *)getauxval(AT_EXECFN)
, anddlgetname(...)
if present. You might even try these beforeargv[0]
-based methods, if they are available and you don't encounter permission issues. In the somewhat unlikely event (when you consider all versions of all systems) that they are present and don't fail, they might be more authoritative.Optional: check for a path name passed in using a command line parameter.
Optional: check for a pathname in the environment explicitly passed in by your wrapper script, if any.
Optional: As a last resort try environment variable "_". It might point to a different program entirely, such as the users shell.
Resolve symlinks, there may be multiple layers. There is the possibility of infinite loops, though if they exist your program probably won't get invoked.
Canonicalize filename by resolving substrings like "/foo/../bar/" to "/bar/". Note this may potentially change the meaning if you cross a network mount point, so canonization is not always a good thing. On a network server, ".." in symlink may be used to traverse a path to another file in the server context instead of on the client. In this case, you probably want the client context so canonicalization is ok. Also convert patterns like "/./" to "/" and "//" to "/". In shell,
readlink --canonicalize
will resolve multiple symlinks and canonicalize name. Chase may do similar but isn't installed.realpath()
orcanonicalize_file_name()
, if present, may help.If
realpath()
doesn't exist at compile time, you might borrow a copy from a permissively licensed library distribution, and compile it in yourself rather than reinventing the wheel. Fix the potential buffer overflow (pass in sizeof output buffer, think strncpy() vs strcpy()) if you will be using a buffer less than PATH_MAX. It may be easier just to use a renamed private copy rather than testing if it exists. Permissive license copy from android/darwin/bsd: https://android.googlesource.com/platform/bionic/+/f077784/libc/upstream-freebsd/lib/libc/stdlib/realpath.cBe aware that multiple attempts may be successful or partially successful and they might not all point to the same executable, so consider verifying your executable; however, you may not have read permission — if you can't read it, don't treat that as a failure. Or verify something in proximity to your executable such as the "../lib/" directory you are trying to find. You may have multiple versions, packaged and locally compiled versions, local and network versions, and local and USB-drive portable versions, etc. and there is a small possibility that you might get two incompatible results from different methods of locating. And "_" may simply point to the wrong program.
A program using
execve
can deliberately setargv[0]
to be incompatible with the actual path used to load the program and corrupt PATH, "_", pwd, etc. though there isn't generally much reason to do so; but this could have security implications if you have vulnerable code that ignores the fact that your execution environment can be changed in variety of ways including, but not limited, to this one (chroot, fuse filesystem, hard links, etc.) It is possible for shell commands to set PATH but fail to export it.You don't necessarily need to code for non-Unix systems but it would be a good idea to be aware of some of the peculiarities so you can write the code in such a way that it isn't as hard for someone to port later. Be aware that some systems (DEC VMS, DOS, URLs, etc.) might have drive names or other prefixes which end with a colon such as "C:\", "sys$drive:[foo]bar", and "file:///foo/bar/baz". Old DEC VMS systems use "[" and "]" to enclose the directory portion of the path though this may have changed if your program is compiled in a POSIX environment. Some systems, such as VMS, may have a file version (separated by a semicolon at the end). Some systems use two consecutive slashes as in "//drive/path/to/file" or "user@host:/path/to/file" (scp command) or "file://hostname/path/to/file" (URL). In some cases (DOS, windoze), PATH might have different separator characters — ";" vs ":" and "\" vs "/" for a path separator. In csh/tsh there is "path" (delimited with spaces) and "PATH" delimited with colons but your program should receive PATH so you don't need to worry about path. DOS and some other systems can have relative paths that start with a drive prefix. C:foo.exe refers to foo.exe in the current directory on drive C, so you do need to lookup current directory on C: and use that for pwd.
An example of symlinks and wrappers on my system:
Note that user bill posted a link above to a program at HP that handles the three basic cases of
argv[0]
. It needs some changes, though:strcat()
andstrcpy()
to usestrncat()
andstrncpy()
. Even though the variables are declared of length PATHMAX, an input value of length PATHMAX-1 plus the length of concatenated strings is > PATHMAX and an input value of length PATHMAX would be unterminated.So, if you combine both the HP code and the realpath code and fix both to be resistant to buffer overflows, then you should have something which can properly interpret
argv[0]
.The following illustrates actual values of
argv[0]
for various ways of invoking the same program on Ubuntu 12.04. And yes, the program was accidentally named echoargc instead of echoargv. This was done using a script for clean copying but doing it manually in shell gets same results (except aliases don't work in script unless you explicitly enable them).These examples illustrate that the techniques described in this post should work in a wide range of circumstances and why some of the steps are necessary.
EDIT: Now, the program that prints argv[0] has been updated to actually find itself.
And here is the output which demonstrates that in every one of the previous tests it actually did find itself.
The two GUI launches described above also correctly find the program.
There is one potential pitfall. The
access()
function drops permissions if the program is setuid before testing. If there is a situation where the program can be found as an elevated user but not as a regular user, then there might be a situation where these tests would fail, although it is unlikely the program could actually be executed under those circumstances. One could use euidaccess() instead. It is possible, however, that it might find an inaccessable program earlier on path than the actual user could.