How do Perl Cwd::cwd and Cwd::getcwd functions dif

2019-04-20 07:05发布

The question

What is the difference between Cwd::cwd and Cwd::getcwd in Perl, generally, without regard to any specific platform? Why does Perl have both? What is the intended use, which one should I use in which scenarios? (Example use cases will be appreciated.) Does it matter? (Assuming I don’t mix them.) Does choice of either one affect portability in any way? Which one is more commonly used in modules?

Even if I interpret the manual is saying that except for corner cases cwd is `pwd` and getcwd just calls getcwd from unistd.h, what is the actual difference? This works only on POSIX systems, anyway.

I can always read the implementation but that tells me nothing about the meaning of those functions. Implementation details may change, not so defined meaning. (Otherwise a breaking change occurs, which is serious business.)

What does the manual say

Quoting Perl’s Cwd module manpage:

Each of these functions are called without arguments and return the absolute path of the current working directory.

  • getcwd

    my $cwd = getcwd();

    Returns the current working directory.

    Exposes the POSIX function getcwd(3) or re-implements it if it's not available.

  • cwd

    my $cwd = cwd();

    The cwd() is the most natural form for the current architecture. For most systems it is identical to `pwd` (but without the trailing line terminator).

And in the Notes section:

  • Actually, on Mac OS, the getcwd(), fastgetcwd() and fastcwd() functions are all aliases for the cwd() function, which, on Mac OS, calls `pwd`. Likewise, the abs_path() function is an alias for fast_abs_path()

OK, I know that on Mac OS1 there is no difference between getcwd() and cwd() as both actually boil down to `pwd`. But what on other platforms? (I’m especially interested in Debian Linux.)


1 Classic Mac OS, not OS X. $^O values are MacOS and darwin for Mac OS and OS X, respectively. Thanks, @tobyink and @ikegami.

And a little meta-question: How to avoid asking similar questions for other modules with very similar functions? Is there a universal way of discovering the difference, other than digging through the implementation? (Currently, I think that if the documentation is not clear about intended use and differences, I have to ask someone more experienced or read the implementation myself.)

1条回答
Fickle 薄情
2楼-- · 2019-04-20 07:18

Generally speaking

I think the idea is that cwd() always resolves to the external, OS-specific way of getting the current working directory. That is, running pwd on Linux, command /c cd on DOS, /usr/bin/fullpath -t in QNX, and so on — all examples are from actual Cwd.pm. The getcwd() is supposed to use the POSIX system call if it is available, and falls back to the cwd() if not.

Why we have both? In the current implementation I believe exporting just getcwd() would be enough for most of systems, but who knows why the logic of “if syscall is available, use it, else run cwd()” can fail on some system (e.g. on MorphOS in Perl 5.6.1).

On Linux

On Linux, cwd() will run `/bin/pwd` (will actually execute the binary and get its output), while getcwd() will issue getcwd(2) system call.

Actual effect inspected via strace

One can use strace(1) to see that in action:

Using cwd():

$ strace -f perl -MCwd -e 'cwd(); ' 2>&1 | grep execve
execve("/usr/bin/perl", ["perl", "-MCwd", "-e", "cwd(); "], [/* 27 vars */]) = 0
[pid 31276] execve("/bin/pwd", ["/bin/pwd"], [/* 27 vars */] <unfinished ...>
[pid 31276] <... execve resumed> )      = 0

Using getcwd():

$ strace -f perl -MCwd -e 'getcwd(); ' 2>&1 | grep execve
execve("/usr/bin/perl", ["perl", "-MCwd", "-e", "getcwd(); "], [/* 27 vars */]) = 0

Reading Cwd.pm source

You can take a look at the sources (Cwd.pm, e.g. in CPAN) and see that for Linux cwd() call is mapped to _backtick_pwd which, as the name suggests, calls the pwd in backticks.

Here is a snippet from Cwd.pm, with my comments:

unless ($METHOD_MAP{$^O}{cwd} or defined &cwd) {
    ...
    # some logic to find the pwd binary here, $found_pwd_cmd is set to 1 on Linux
    ...
    if( $os eq 'MacOS' || $found_pwd_cmd )
    {
        *cwd = \&_backtick_pwd;  # on Linux we actually go here
    }
    else {
        *cwd = \&getcwd;
    }
}

Performance benchmark

Finally, the difference between two is that cwd(), which calls another binary, must be slower. We can make some kind of a performance test:

$ time perl -MCwd -e 'for (1..10000) { cwd(); }'

real    0m7.177s
user    0m0.380s
sys     0m1.440s

Now compare it with the system call:

$ time perl -MCwd -e 'for (1..10000) { getcwd(); }'

real    0m0.018s
user    0m0.009s
sys     0m0.008s

Discussion, choice

But as you don't usually query the current working directory too often, both options will work — unless you cannot spawn any more processes for some reason related to ulimit, out of memory situation, etc.

Finally, as for selecting which one to use: for Linux, I would always use getcwd(). I suppose you will need to make your tests and select which function to use if you are going to write a portable piece of code that will run on some really strange platform (here, of course, Linux, OS X, and Windows are not in the list of strange platforms).

查看更多
登录 后发表回答