Why does glob lstat matching entries?

2019-04-07 16:43发布

问题:

Looking into behavior in this question, I was surprised to see that perl lstat()s every path matching a glob pattern:

$ mkdir dir
$ touch dir/{foo,bar,baz}.txt  
$ strace -e trace=lstat perl -E 'say $^V; <dir/b*>' 
v5.10.1
lstat("dir/baz.txt", {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
lstat("dir/bar.txt", {st_mode=S_IFREG|0664, st_size=0, ...}) = 0

I see the same behavior on my Linux system with glob(pattern) and <pattern>, and with later versions of perl.

My expectation was that the globbing would simply opendir/readdir under the hood, and that it would not need to inspect the actual pathnames it was searching.

What is the purpose of this lstat? Does it affect the glob()s return?

回答1:

This strange behavior has been noticed before on PerlMonks. It turns out that glob calls lstat to support its GLOB_MARK flag, which has the effect that:

Each pathname that is a directory that matches the pattern has a slash appended.

To find out whether a directory entry refers to a subdir, you need to stat it. This is apparently done even when the flag is not given.



回答2:

I was wondering the same thing - "What is the purpose of this lstat? Does it affect the glob()s return?"

Within bsd_glob.c glob2() I noticed a g_stat call within an if branch that required the GLOB_MARK flag to be set, I also noticed a call to g_lstat just before that was not guarded by a flag check. Both are within an if branch for when the end of pattern is reached. If I remove these 2 lines in the glob2 function in perl-5.12.4/ext/File-Glob/bsd_glob.c

- if (g_lstat(pathbuf, &sb, pglob))
-     return(0);

the only perl test (make test) that fails is test 5 in ext/File-Glob/t/basic.t with:

not ok 5
#   Failed test at ../ext/File-Glob/t/basic.t line 92.
#     Structures begin differing at:
#          $got->[0] = 'asdfasdf'
#     $expected->[0] = Does not exist

Test 5 in t/basic.t is

# check nonexistent checks
# should return an empty list
# XXX since errfunc is NULL on win32, this test is not valid there
@a = bsd_glob("asdfasdf", 0);
SKIP: {
    skip $^O, 1 if $^O eq 'MSWin32' || $^O eq 'NetWare';
    is_deeply(\@a, []);
}

If I replace the 2 lines removed with:

+   if (!((pglob->gl_flags & GLOB_NOCHECK) ||
+         ((pglob->gl_flags & GLOB_NOMAGIC) &&
+          !(pglob->gl_flags & GLOB_MAGCHAR)))){
+     if (g_lstat(pathbuf, &sb, pglob))
+       return(0);
+   }

I don't see any failures from "make test" for perl-5.12.4 on linux x86_64 (RHEL6.3 2.6.32-358.11.1.el6.x86_64) and when using:

strace -fe trace=lstat perl -e 'use File::Glob q{:glob};
                               print scalar bsd_glob(q{/var/log/*},GLOB_NOCHECK)'

I no longer see the lstat calls for each file in the dir. I don't mean to suggest that the perl tests for glob (File-Glob) are comprehensive (they are not), or that a change such as this will not break existing behaviour (this seems likely). As far as I can tell the code with this (g_l)stat call existed in original-bsd/lib/libc/gen/glob.c 24 years ago in 1990.

Also see:

  • Chapter 6. Benchmarking Perl of "Mastering Perl" By brian d foy, Randal L. Schwartz contains a section on comparing code where code using glob() and opendir() is compared.
  • "future globs (was "UNIX mindset...")" in comp.unix.wizards from Dick Dunn in 1991.
  • Usenet newsgroup mod.sources "'Globbing' library routine (glob)" from Guido van Rossum in July 1986 - I don't see a reference to "stat" in this code.


标签: linux perl glob