Hash keys encoding: Why do I get here with Devel::

2019-06-15 13:48发布

Why do I get here with Devel::Peek::Dump two different results?

#!/usr/bin/env perl
use warnings;
use 5.014;
use utf8;
binmode STDOUT, ':encoding(utf-8)';
use Devel::Peek;

my %hash1 = ( 'müller' => 1 );
say Dump $_ for keys %hash1;

my %hash2;
$hash2{'müller'} = 1;
say Dump $_ for keys %hash2;

Output:

SV = PV(0x753270) at 0x76d230
  REFCNT = 2
  FLAGS = (POK,pPOK,UTF8)
  PV = 0x759750 "m\303\274ller"\0 [UTF8 "m\x{fc}ller"]
  CUR = 7
  LEN = 8

SV = PV(0x753270) at 0x7d75a8
  REFCNT = 2
  FLAGS = (POK,FAKE,READONLY,pPOK)
  PV = 0x799110 "m\374ller"
  CUR = 6
  LEN = 0

2条回答
smile是对你的礼貌
2楼-- · 2019-06-15 14:21

This is not an answer, I believe ikegami response is correct. I just wanted to add some observations with some code.

I ran the following code through 5.10 to 5.15 and the behavior is consistent.

use utf8;
use Test::More;

{
    my %h = ('müller' => 1);
    my $k = (keys %h)[0];
    ok(utf8::is_utf8($k), 'UTF-8 Latin-1 hash key has SvUTF8 set');
}

{
    my %h = ('müller' => 1);
       $h{'müller'} = 2;
    my $k = (keys %h)[0];
    ok( ! utf8::is_utf8($k), 'UTF-8 Latin-1 hash key does not has SvUTF8 set after assignment');
}

{
    my %h = ('☺' => 1);
       $h{'☺'} = 2;
    my $k = (keys %h)[0];
    ok(utf8::is_utf8($k), 'UTF-8 (> Latin-1) hash key has SvUTF8 set after assignment');
}

done_testing;

If the second test is expected, it would be the first silent downgrade I'm aware of. I guess p5p has the final answer whether or not this is a optimization bug or expected behavior. (sv_dump looks like a optimization (POK,FAKE,READONLY,pPOK))

查看更多
一纸荒年 Trace。
3楼-- · 2019-06-15 14:28

Both of those scalars contain exactly the same string. The only difference is only in how the string is stored internally.

My guess is that the key is normalised to make comparisons easier when trying to locate the key in the hash.

查看更多
登录 后发表回答