Hash keys encoding: Why do I get here with Devel::

2019-06-15 13:47发布

问题:

Why do I get here with Devel::Peek::Dump two different results?

#!/usr/bin/env perl
use warnings;
use 5.014;
use utf8;
binmode STDOUT, ':encoding(utf-8)';
use Devel::Peek;

my %hash1 = ( 'müller' => 1 );
say Dump $_ for keys %hash1;

my %hash2;
$hash2{'müller'} = 1;
say Dump $_ for keys %hash2;

Output:

SV = PV(0x753270) at 0x76d230
  REFCNT = 2
  FLAGS = (POK,pPOK,UTF8)
  PV = 0x759750 "m\303\274ller"\0 [UTF8 "m\x{fc}ller"]
  CUR = 7
  LEN = 8

SV = PV(0x753270) at 0x7d75a8
  REFCNT = 2
  FLAGS = (POK,FAKE,READONLY,pPOK)
  PV = 0x799110 "m\374ller"
  CUR = 6
  LEN = 0

回答1:

Both of those scalars contain exactly the same string. The only difference is only in how the string is stored internally.

My guess is that the key is normalised to make comparisons easier when trying to locate the key in the hash.



回答2:

This is not an answer, I believe ikegami response is correct. I just wanted to add some observations with some code.

I ran the following code through 5.10 to 5.15 and the behavior is consistent.

use utf8;
use Test::More;

{
    my %h = ('müller' => 1);
    my $k = (keys %h)[0];
    ok(utf8::is_utf8($k), 'UTF-8 Latin-1 hash key has SvUTF8 set');
}

{
    my %h = ('müller' => 1);
       $h{'müller'} = 2;
    my $k = (keys %h)[0];
    ok( ! utf8::is_utf8($k), 'UTF-8 Latin-1 hash key does not has SvUTF8 set after assignment');
}

{
    my %h = ('☺' => 1);
       $h{'☺'} = 2;
    my $k = (keys %h)[0];
    ok(utf8::is_utf8($k), 'UTF-8 (> Latin-1) hash key has SvUTF8 set after assignment');
}

done_testing;

If the second test is expected, it would be the first silent downgrade I'm aware of. I guess p5p has the final answer whether or not this is a optimization bug or expected behavior. (sv_dump looks like a optimization (POK,FAKE,READONLY,pPOK))