How can I dump a string in perl to see if there ar

2019-05-26 04:43发布

问题:

I've occasionally had problems with strings being subtly different, in some cases utf8::all changed the behavior, so I assume the subtle differences are unicode. I'd like to dump strings in such a way that the differences will be visual to me. What are my options for doing this?

回答1:

I recommend the Dump function in the Devel::Peek module in the Perl core:

$ perl -MDevel::Peek -e 'Dump "abc"'
SV = PV(0x10441500) at 0x10491680
  REFCNT = 1
  FLAGS = (PADTMP,POK,READONLY,pPOK)
  PV = 0x10442224 "abc"\0
  CUR = 3
  LEN = 4

$ perl -MDevel::Peek -e 'Dump "\x{FEFF}abc"'
SV = PV(0x10441050) at 0x10443be0
  REFCNT = 1
  FLAGS = (PADTMP,POK,READONLY,pPOK,UTF8)
  PV = 0x10449bc0 "\357\273\277abc"\0 [UTF8 "\x{feff}abc"]
  CUR = 6
  LEN = 8

(You see how FLAGS contains UTF8 in the second example, because of the wide character, but not in the first?)



回答2:

For most uses, Data::Dumper with Useqq will do.

use utf8;
use Data::Dumper;
local $Data::Dumper::Useqq = 1;
print(Dumper("foo–bar"));
print(Dumper("foo-bar"));

Output:

$VAR1 = "foo\x{2013}bar";
$VAR1 = "foo-bar";

If you want internal details (such as the UTF8 flag), use Devel::Peek.

use utf8;
use Devel::Peek;
Dump("foo–bar");
Dump("foo-bar");

Output:

SV = PV(0x328ccc) at 0x1d6a0c4
  REFCNT = 1
  FLAGS = (PADTMP,POK,READONLY,pPOK,UTF8)
  PV = 0x1d6d52c "foo\342\200\223bar"\0 [UTF8 "foo\x{2013}bar"]
  CUR = 9
  LEN = 12
SV = PV(0x328dcc) at 0x32b594
  REFCNT = 1
  FLAGS = (PADTMP,POK,READONLY,pPOK)
  PV = 0x1d6d50c "foo-bar"\0
  CUR = 7
  LEN = 12


回答3:

Have you tried Test::LongString? Even though it's really a test module, it is handy for showing you where the differences in a string occur. It focuses on the parts that are different instead of showing you the whole string, and it make \x{} escapes for specials.

I'd like to see an example where utf8::all changed the behavior, even if just to see an interesting edge case.



回答4:

All you need to dump out any string is:

printf "U+%v04X\n", $string;

You could use this to format a string:

($print_string = $string) =~ s/([^\x20-\x7E])/sprintf "\\x{%x}", $1/ge;

or even

use charnames ();
($print_string = $string) =~ s/([^\x20-\x7E])/sprintf "\\N{%s}", charnames::viacode(ord $1)/ge;

I have no idea why in the wolrd you would use the misleadingly named utf8::all. It’s not a core module, and you seem to be having some sort of trouble with knowing what it is really doing. If you explicitly used the individual core pieces that go into it, maybe you would understand it all better.