I put together a perl script that works to replace Japanese file names to English file names. But there are still a couple of things that I don’t quite understand well.
I have the following configuration Client OS:
Windows XP Japan
Notepad++, installed
Server:
Red Hat Enterprise Linux Server release 6.2
Perl v5.10.1
VIM : VIM version 7.2.411
Xterm : ASTEC-X version 6.0
CSH: tcsh 6.17.00 (Astron)
The source of the files are Japanese .csv files generated on Windows. I saw posts about using utf8 and encoding conversion in Perl, and I hope to understand better why I didn’t need anything mentioned in the other threads.
Here is my script that worked? My questions are below.
#!/usr/bin/perl
my $work_dir = "/nas1_home4/fsomeguy/someplace";
opendir(DIR, $work_dir) or die "Cannot open directory";
my @files = readdir(DIR);
foreach (@files)
{
my $original_file = $_;
s/機/–machine_/; # replace 機 with -machine_
my $new_file = $_;
if ($new_file ne $original_file)
{
print "Rename " . $original_file . " to " . $new_file;
rename("${work_dir}/${original_file}", "${work_dir}/${new_file}") or print "Warning: rename failed because: $!\n";
}
}
Questions:
1) Why isn’t utf8 required in this sample? In what type of examples would I need it. Use uft8; was discussed: use utf8 gives me 'Wide character in print')? But if I have added use utf8, then this script won’t work.
2) Why isn’t encoding manipulation required in this sample?
I actually wrote the script in Windows using Notepad++ (pasting in the Japanese characters from Windows XP Japan’s Explorer to my script). In Xterm, and VIM, the characters show up as garbage characters. But I didn’t have to deal with Encoding manipulation either, which was discussed here How can I convert japanese characters to unicode in Perl? .
Thanks.
UPDATES 1
Testing a simple localization sample in Perl for filename and file text replacement in Japanese
In Windows XP, copy the 南 character from within a .csv data file and copy to the clipboard, then use it as both the file name (ie. 南.txt) and file content (南). In Notepad++ , reading the file under encoding UTF-8 shows x93xEC, reading it under SHIFT_JIS displays南.
Script:
Use the following Perl script south.pl, which will be run on a Linux server with Perl 5.10
#!/usr/bin/perl
use feature qw(say);
use strict;
use warnings;
use utf8;
use Encode qw(decode encode);
my $user_dir="/usr/frank";
my $work_dir = "${user_dir}/test_south";
# forward declare the function prototypes
sub fileProcess;
opendir(DIR, ${work_dir}) or die "Cannot open directory " . ${work_dir};
# readdir OPTION 1 - shift_jis
#my @files = map { Encode::decode("shift_jis", $_); } readdir DIR; # Note filename could not be decoded as shift_jis
#binmode(STDOUT,":encoding(shift_jis)");
# readdir OPTION 2 - utf8
my @files = map { Encode::decode("utf8", $_); } readdir DIR; # Note filename could be decoded as utf8
binmode(STDOUT,":encoding(utf8)"); # setting display to output utf8
say @files;
# pass an array reference of files that will be modified
fileNameTranslate();
fileProcess();
closedir(DIR);
exit;
sub fileNameTranslate
{
foreach (@files)
{
my $original_file = $_;
#print "original_file: " . "$original_file" . "\n";
s/南/south/;
my $new_file = $_;
# print "new_file: " . "$_" . "\n";
if ($new_file ne $original_file)
{
print "Rename " . $original_file . " to \n\t" . $new_file . "\n";
rename("${work_dir}/${original_file}", "${work_dir}/${new_file}") or print "Warning: rename failed because: $!\n";
}
}
}
sub fileProcess
{
# file process OPTION 3, open file as shift_jis, the search and replace would work
# open (IN1, "<:encoding(shift_jis)", "${work_dir}/south.txt") or die "Error: south.txt\n";
# open (OUT1, "+>:encoding(shift_jis)" , "${work_dir}/south1.txt") or die "Error: south1.txt\n";
# file process OPTION 4, open file as utf8, the search and replace would not work
open (IN1, "<:encoding(utf8)", "${work_dir}/south.txt") or die "Error: south.txt\n";
open (OUT1, "+>:encoding(utf8)" , "${work_dir}/south1.txt") or die "Error: south1.txt\n";
while (<IN1>)
{
print $_ . "\n";
chomp;
s/南/south/g;
print OUT1 "$_\n";
}
close IN1;
close OUT1;
}
Result:
(BAD) Uncomment Option 1 and 3, (Comment Option 2 and 4) Setup: Readdir encoding, SHIFT_JIS; file open encoding SHIFT_JIS Result: file name replacement failed.. Error: utf8 "\x93" does not map to Unicode at .//south.pl line 68. \x93
(BAD) Uncomment Option 2 and 4 (Comment Option 1 and 3) Setup: Readdir encoding, utf8; file open encoding utf8 Result: file name replacement worked, south.txt generated But south1.txt file content replacement failed , it has the content \x93 (). Error: "\x{fffd}" does not map to shiftjis at .//south.pl line 25. ... -Ao?= (Bx{fffd}.txt
(GOOD) Uncomment Option 2 and 3, (Comment Option 1 and 4) Setup: Readdir encoding, utf8; file open encoding SHIFT_JIS Result: file name replacement worked, south.txt generated South1.txt file content replacement worked, it has the content south.
Conclusion:
I had to use different encoding scheme for this example to work properly. Readdir utf8, and file processing SHIFT_JIS since the content of the csv file was SHIFT_JIS encoded.