Should perl's File::Glob always be post-filter

2019-04-29 07:44发布

问题:

The output of the following minimal example shows that (on my linux machine) File::Glob seems to have the unexpected side-effect of converting a utf8 string to non-utf8:

#!/usr/bin/perl 

use utf8;

use strict;

my $x = "påminnelser";
my $y = glob $x;

print "x=",utf8::is_utf8($x),"=\n";
print "y=",utf8::is_utf8($y),"=\n";

This is causing wrong behavior in my program. On linux, it looks like I can fix it by applying utf8::decode() after File::Glob. Is this the right way to fix this? Is this a bug in File::Glob? Will my fix produce correct results on other systems such as Windows?

回答1:

Encoding handling of functions dealing with file names is currently on perl's todo list: Unicode in Filenames. Problem is that some popular operating systems (i.e. Linux) don't have support for file name encoding (other than using the current locale settings, but this is broken by design), so getting a portable solution in Perl is not that easy.

My advice is to avoid non-ASCII file names at all.