I pass a utf8 encoded string from my command line into a Perl program:
> ./test.pl --string='ḷet ūs try ṭhiñgs'
which seems to recognize the string correctly:
use utf8;
GetOptions(
'string=s' => \$string,
) or die;
print Dumper($string);
print Dumper(utf8::is_utf8($string));
print Dumper(utf8::valid($string));
prints
$VAR1 = 'ḷet ūs try ṭhiñgs';
$VAR1 = '';
$VAR1 = 1;
When I store this string into a hash and call encode_json on it, the string seems to be again encoded whereas to_json seems to work (if I read the output correctly):
my %a = ( 'nāme' => $string ); # Note the Unicode character
print Dumper(\%a);
print Dumper(encode_json(\%a));
print Dumper(to_json(\%a));
prints
$VAR1 = {
"n\x{101}me" => 'ḷet ūs try ṭhiñgs'
};
$VAR1 = '{"nāme":"ḷet Å«s try á¹hiñgs"}';
$VAR1 = "{\"n\x{101}me\":\"\x{e1}\x{b8}\x{b7}et \x{c5}\x{ab}s try \x{e1}\x{b9}\x{ad}hi\x{c3}\x{b1}gs\"}";
Turning this back into the original hash, however, doesn't seem to work with either methods and in both cases hash and string and broken:
print Dumper(decode_json(encode_json(\%a)));
print Dumper(from_json(to_json(\%a)));
prints
$VAR1 = {
"n\x{101}me" => "\x{e1}\x{b8}\x{b7}et \x{c5}\x{ab}s try \x{e1}\x{b9}\x{ad}hi\x{c3}\x{b1}gs"
};
$VAR1 = {
"n\x{101}me" => "\x{e1}\x{b8}\x{b7}et \x{c5}\x{ab}s try \x{e1}\x{b9}\x{ad}hi\x{c3}\x{b1}gs"
};
A hash lookup $a{'nāme'}
now fails.
Question: How do I handle utf8 encoding and strings and JSON encode/decode correctly in Perl?
You need to decode your input:
Putting it all together, we get:
Example:
Make sure your source file is actually encoded as UTF-8!