Could File::Find::Rule be patched to automatically

2019-02-25 15:00发布

Suppose I have a file with name æ (UNICODE : 0xE6, UTF8 : 0xC3 0xA6) in the current directory.

Then, I would like to use File::Find::Rule to locate it:

use feature qw(say);
use open qw( :std :utf8 );
use strict;
use utf8;
use warnings;

use File::Find::Rule;

my $fn = 'æ';
my @files = File::Find::Rule->new->name($fn)->in('.');
say $_ for @files;

The output is empty, so apparently this did not work.

If I try to encode the filename first:

use Encode;

my $fn = 'æ';
my $fn_utf8 = Encode::encode('UTF-8', $fn, Encode::FB_CROAK | Encode::LEAVE_SRC);
my @files = File::Find::Rule->new->name($fn_utf8)->in('.');
say $_ for @files;

The output is:

æ

So it found the file, but the returned filename is not decoded into a Perl string. To fix this, I can decode the result, replacing the last line with:

say Encode::decode('UTF-8', $_, Encode::FB_CROAK) for @files;

The question is if both the encoding and decoding could/should have been done automatically by File::Find::Rule so I could have used my original program and not have had to worry about encoding and decoding at all?

(For example, could File::Find::Rule have used I18N::Langinfo to determine that the current locale's codeset is UTF-8 ?? )

1条回答
混吃等死
2楼-- · 2019-02-25 15:28

Yeah, I wish. If there's was a major Perl project I'd work on, this would be it.

The issue is that there could be badly-encoded file names, including file names encoded using a different encoding than expected. That means the first thing needed is a way of round-tripping badly-encoded file names through a decode-encode process. I think Python uses the surrogate pair code points to represent the bad bytes.

You would need a pragma to ensure backwards compatibility.

查看更多
登录 后发表回答