Why do I get here with Devel::Peek::Dump two different results?
#!/usr/bin/env perl
use warnings;
use 5.014;
use utf8;
binmode STDOUT, ':encoding(utf-8)';
use Devel::Peek;
my %hash1 = ( 'müller' => 1 );
say Dump $_ for keys %hash1;
my %hash2;
$hash2{'müller'} = 1;
say Dump $_ for keys %hash2;
Output:
SV = PV(0x753270) at 0x76d230
REFCNT = 2
FLAGS = (POK,pPOK,UTF8)
PV = 0x759750 "m\303\274ller"\0 [UTF8 "m\x{fc}ller"]
CUR = 7
LEN = 8
SV = PV(0x753270) at 0x7d75a8
REFCNT = 2
FLAGS = (POK,FAKE,READONLY,pPOK)
PV = 0x799110 "m\374ller"
CUR = 6
LEN = 0
Both of those scalars contain exactly the same string. The only difference is only in how the string is stored internally.
My guess is that the key is normalised to make comparisons easier when trying to locate the key in the hash.
This is not an answer, I believe ikegami response is correct. I just wanted to add some observations with some code.
I ran the following code through 5.10 to 5.15 and the behavior is consistent.
use utf8;
use Test::More;
{
my %h = ('müller' => 1);
my $k = (keys %h)[0];
ok(utf8::is_utf8($k), 'UTF-8 Latin-1 hash key has SvUTF8 set');
}
{
my %h = ('müller' => 1);
$h{'müller'} = 2;
my $k = (keys %h)[0];
ok( ! utf8::is_utf8($k), 'UTF-8 Latin-1 hash key does not has SvUTF8 set after assignment');
}
{
my %h = ('☺' => 1);
$h{'☺'} = 2;
my $k = (keys %h)[0];
ok(utf8::is_utf8($k), 'UTF-8 (> Latin-1) hash key has SvUTF8 set after assignment');
}
done_testing;
If the second test is expected, it would be the first silent downgrade I'm aware of. I guess p5p has the final answer whether or not this is a optimization bug or expected behavior. (sv_dump looks like a optimization (POK,FAKE,READONLY,pPOK))