Looking into behavior in this question, I was surprised to see that perl lstat()
s every path matching a glob pattern:
$ mkdir dir
$ touch dir/{foo,bar,baz}.txt
$ strace -e trace=lstat perl -E 'say $^V; <dir/b*>'
v5.10.1
lstat("dir/baz.txt", {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
lstat("dir/bar.txt", {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
I see the same behavior on my Linux system with glob(pattern)
and <pattern>
, and with later versions of perl.
My expectation was that the globbing would simply opendir/readdir under the hood, and that it would not need to inspect the actual pathnames it was searching.
What is the purpose of this lstat
? Does it affect the glob()s return?
This strange behavior has been noticed before on PerlMonks. It turns out that glob
calls lstat
to support its GLOB_MARK
flag, which has the effect that:
Each pathname that is a directory that matches the pattern has a slash appended.
To find out whether a directory entry refers to a subdir, you need to stat
it. This is apparently done even when the flag is not given.
I was wondering the same thing - "What is the purpose of this lstat? Does it affect the glob()s return?"
Within bsd_glob.c glob2() I noticed a g_stat call within an if branch that required the GLOB_MARK flag to be set, I also noticed a call to g_lstat just before that was not guarded by a flag check. Both are within an if branch for when the end of pattern is reached.
If I remove these 2 lines in the glob2 function in perl-5.12.4/ext/File-Glob/bsd_glob.c
- if (g_lstat(pathbuf, &sb, pglob))
- return(0);
the only perl test (make test) that fails is test 5 in ext/File-Glob/t/basic.t with:
not ok 5
# Failed test at ../ext/File-Glob/t/basic.t line 92.
# Structures begin differing at:
# $got->[0] = 'asdfasdf'
# $expected->[0] = Does not exist
Test 5 in t/basic.t is
# check nonexistent checks
# should return an empty list
# XXX since errfunc is NULL on win32, this test is not valid there
@a = bsd_glob("asdfasdf", 0);
SKIP: {
skip $^O, 1 if $^O eq 'MSWin32' || $^O eq 'NetWare';
is_deeply(\@a, []);
}
If I replace the 2 lines removed with:
+ if (!((pglob->gl_flags & GLOB_NOCHECK) ||
+ ((pglob->gl_flags & GLOB_NOMAGIC) &&
+ !(pglob->gl_flags & GLOB_MAGCHAR)))){
+ if (g_lstat(pathbuf, &sb, pglob))
+ return(0);
+ }
I don't see any failures from "make test" for perl-5.12.4 on linux x86_64 (RHEL6.3 2.6.32-358.11.1.el6.x86_64) and when using:
strace -fe trace=lstat perl -e 'use File::Glob q{:glob};
print scalar bsd_glob(q{/var/log/*},GLOB_NOCHECK)'
I no longer see the lstat calls for each file in the dir.
I don't mean to suggest that the perl tests for glob (File-Glob) are comprehensive (they are not), or that a change such as this will not break existing
behaviour (this seems likely). As far as I can tell the code with this (g_l)stat call existed in original-bsd/lib/libc/gen/glob.c 24 years ago in 1990.
Also see:
- Chapter 6. Benchmarking Perl of "Mastering Perl" By brian d foy, Randal L. Schwartz
contains a section on comparing code where code using glob() and opendir() is compared.
- "future globs (was "UNIX mindset...")" in comp.unix.wizards from Dick Dunn in 1991.
- Usenet newsgroup mod.sources "'Globbing' library routine (glob)" from Guido van Rossum in July 1986 - I don't see a reference to "stat" in this code.