Perl中日英的文件名替换(Perl Japanese to English filename re

我放在一起工作，以取代日本文件名以英文文件名的perl脚本。但仍有一对夫妇的事情，我不太明白很好。

我有以下的配置客户端操作系统：

Windows XP中日本

记事本+ +，安装

服务器：

红帽企业Linux服务器6.2版

Perl的v5.10.1

VIM：VIM版本7.2.411

的xterm：ASTEC-X 6.0版

CSH：tcsh的00年6月17日（天文学）

这些文件的来源是在Windows上生成的日本的.csv文件。我看到了有关在Perl使用UTF8和编码转换岗位，我希望更好地了解为什么我没有必要在其他线程提到的任何东西。

这里是我的脚本工作？我的问题是下面。

#!/usr/bin/perl
my $work_dir = "/nas1_home4/fsomeguy/someplace";
opendir(DIR, $work_dir) or die "Cannot open directory";
my @files = readdir(DIR);
foreach (@files) 
{
    my $original_file = $_; 
    s/機/–machine_/; # replace 機 with -machine_
    my $new_file = $_;
    if ($new_file ne $original_file)
    {
        print "Rename " . $original_file . " to " . $new_file;
        rename("${work_dir}/${original_file}", "${work_dir}/${new_file}") or  print "Warning: rename failed because: $!\n";
    }
}

问题：

1）为什么不UTF8此示例中需要的？在什么样的例子我需要它。使用UFT8; 讨论：使用UTF8给我“在打印宽字符” ）？但是，如果我已经添加使用UTF-8，那么这个脚本将无法工作。

2）为什么不编码该样品中的所需要的操作？
我实际使用记事本++（粘贴在从Windows XP日本的资源管理器的日文字符给我的脚本）写的剧本在Windows中。在Xterm的，和VIM，字符显示为乱码。但我没有处理任何编码操作，这是这里讨论我如何转换日文字符在Perl为Unicode？。

谢谢。

更新1在

在Perl测试一个简单的定位样品文件名和文件中的文本替换在日本

在Windows XP中，从.csv数据文件中复制南的性格和复制到剪贴板中，然后用它作为两个文件名（即南.TXT）和文件内容（南）。在记事本++，读下编码UTF-8的文件显示x93xEC，下SHIFT_JIS显示器阅读它南。

脚本：

使用下面的Perl脚本south.pl，这将在Linux服务器上用Perl 5.10下运行

#!/usr/bin/perl
use feature qw(say);

use strict;
use warnings;
use utf8;
use Encode qw(decode encode);

my $user_dir="/usr/frank";
my $work_dir = "${user_dir}/test_south";

# forward declare the function prototypes
sub fileProcess;

opendir(DIR, ${work_dir}) or die "Cannot open directory " . ${work_dir};

# readdir OPTION 1 - shift_jis
#my @files = map { Encode::decode("shift_jis", $_); } readdir DIR; # Note filename    could not be decoded as shift_jis
#binmode(STDOUT,":encoding(shift_jis)");                    

# readdir OPTION 2 - utf8
my @files = map { Encode::decode("utf8", $_); } readdir DIR; # Note filename could be decoded as utf8
binmode(STDOUT,":encoding(utf8)");                           # setting display to output utf8

say @files;                                 

# pass an array reference of files that will be modified
fileNameTranslate();
fileProcess();

closedir(DIR);

exit;

sub fileNameTranslate
{

    foreach (@files) 
    {
        my $original_file = $_; 
        #print "original_file: " . "$original_file" . "\n";     
        s/南/south/;     

        my $new_file = $_;
        # print "new_file: " . "$_" . "\n";

        if ($new_file ne $original_file)
        {
            print "Rename " . $original_file . " to \n\t" . $new_file . "\n";
            rename("${work_dir}/${original_file}", "${work_dir}/${new_file}") or print "Warning: rename failed because: $!\n";
        }
    }
}

sub fileProcess
{

    #   file process OPTION 3, open file as shift_jis, the search and replace would work
    #   open (IN1,  "<:encoding(shift_jis)", "${work_dir}/south.txt") or die "Error: south.txt\n";
    #   open (OUT1, "+>:encoding(shift_jis)" , "${work_dir}/south1.txt") or die "Error: south1.txt\n";  

    #   file process OPTION 4, open file as utf8, the search and replace would not work
open (IN1,  "<:encoding(utf8)", "${work_dir}/south.txt") or die "Error: south.txt\n";
    open (OUT1, "+>:encoding(utf8)" , "${work_dir}/south1.txt") or die "Error: south1.txt\n";   

    while (<IN1>)
    {
        print $_ . "\n";
        chomp;

        s/南/south/g;


        print OUT1 "$_\n";
    }

    close IN1;
    close OUT1; 
}

结果：

（BAD）取消注释选项1和3，（comment选项2和4）设置：READDIR编码，SHIFT_JIS; 文件打开编码SHIFT_JIS结果：文件名替换失败..错误：UTF8“\ X93”不映射到Unicode在.//south.pl线68 \ X93

（BAD）取消注释选项2和4（comment选项1和3）设置：READDIR编码，UTF8; 文件打开编码UTF8结果：文件名更换的工作，south.txt产生，但是south1.txt文件内容替换故障，它具有内容\ X93（）。错误： “\ X {FFFD}” 不映射在.//south.pl管线25到SHIFTJIS ... -AO =（Bx的{} FFFD .TXT？

（GOOD）取消注释选项2和3，（comment选项1和4）设置：READDIR编码，UTF8; 文件打开编码SHIFT_JIS结果：文件名更换工作，south.txt生成South1.txt文件内容替换工作，它具有内容南部。

结论：

我不得不使用不同的编码方案，此示例才能正常工作。 READDIR UTF8，以及文件处理SHIFT_JIS由于csv文件的含量为SHIFT_JIS编码。

你的剧本是完全的Unicode不知。它把所有的字符串作为字节序列。幸运的是，编码的文件名中的字节是相同的编码在源中使用的日语字符字节。如果你告诉Perl来use utf8这将解释日文字符在你的脚本，但不是那些从文件系统的到来，所以会有不匹配。