I would like to print out the content of an associative array.
For this I'm using Data::dumper.
So, for exemple, if the associative array is called "%w", I write :
print OUT Dumper(\%w);
Here's the problem: there are some words like "récente" that are printed out as "r\x{e9}cente".
If I write just :
print OUT %w;
I've no problems, so "récente" it will be printed out as "récente".
All text files used for the script are in utf8.
Moreover I use the module "utf8" and I specify always the character encoding system.
For ex. :
open( IN, '<', $file_in);
binmode(IN,":utf8");
I'm pretty sure that the problem is related to Data::dumper. Is there a way to solve this or another way to print out the content of an associative array?
Thank you.
This is intentional. The output by Data::Dumper
is intended to produce the same data structure when eval
uated as Perl code. To limit the effect of character encodings, non-ASCII characters will be dumped using escapes. In addition to that, it's sensible to set $Data::Dumper::Useqq = 1
so that any unprintable characters are dumped using escapes.
Data::Dumper
isn't really meant as a way to display data structures – if you have specific formatting requirements, just write the necessary code yourself. For example
use utf8;
use feature 'say';
open my $out, ">:utf8", $filename or die "Can't open $filename: $!";
my %hash = (
bárewørdş => '–Uni·code–',
);
say { $out } "{";
for my $key (sort keys %hash) {
say { $out } " $key: $hash{$key}";
}
say { $out } "}";
produces
{
bárewørdş: –Uni·code–
}
You can also use Data::Dumper::AutoEncode.
use utf8;
use Data::Dumper::AutoEncode;
warn eDumper($hash_ref);
cpan Data::Dumper::AutoEncode
This works for me:
use strict;
use warnings;
use Data::Dumper;
$Data::Dumper::Useperl = 1;
binmode STDOUT, ":utf8";
{ no warnings 'redefine';
sub Data::Dumper::qquote {
my $s = shift;
return "'$s'";
}
}
my $s = "rcente\x{3a3}";
my %w = ($s=>12);
print Dumper(\%w), "\n";
Data::Dumper is a debugging tool. It's letting you know what the string contains without making it susceptible to encoding errors. That's not a problem, that's a feature. What it emitted ("r\x{e9}cente"
) is a sufficiently readable representation of the string you had (72 E9 63 65 6E 74 65
).