How can I dump a string in perl to see if there ar

I've occasionally had problems with strings being subtly different, in some cases utf8::all changed the behavior, so I assume the subtle differences are unicode. I'd like to dump strings in such a way that the differences will be visual to me. What are my options for doing this?

标签： perl unicode encoding character-encoding dump

4条回答

ゆ、 Hurt°

2楼-- · 2019-05-26 04:43

I recommend the Dump function in the Devel::Peek module in the Perl core:

$ perl -MDevel::Peek -e 'Dump "abc"'
SV = PV(0x10441500) at 0x10491680
  REFCNT = 1
  FLAGS = (PADTMP,POK,READONLY,pPOK)
  PV = 0x10442224 "abc"\0
  CUR = 3
  LEN = 4

$ perl -MDevel::Peek -e 'Dump "\x{FEFF}abc"'
SV = PV(0x10441050) at 0x10443be0
  REFCNT = 1
  FLAGS = (PADTMP,POK,READONLY,pPOK,UTF8)
  PV = 0x10449bc0 "\357\273\277abc"\0 [UTF8 "\x{feff}abc"]
  CUR = 6
  LEN = 8

(You see how FLAGS contains UTF8 in the second example, because of the wide character, but not in the first?)

0人赞添加讨论(0) 举报

我欲成王，谁敢阻挡

3楼-- · 2019-05-26 04:50

For most uses, Data::Dumper with Useqq will do.

use utf8;
use Data::Dumper;
local $Data::Dumper::Useqq = 1;
print(Dumper("foo–bar"));
print(Dumper("foo-bar"));

Output:

$VAR1 = "foo\x{2013}bar";
$VAR1 = "foo-bar";

If you want internal details (such as the UTF8 flag), use Devel::Peek.

use utf8;
use Devel::Peek;
Dump("foo–bar");
Dump("foo-bar");

Output:

SV = PV(0x328ccc) at 0x1d6a0c4
  REFCNT = 1
  FLAGS = (PADTMP,POK,READONLY,pPOK,UTF8)
  PV = 0x1d6d52c "foo\342\200\223bar"\0 [UTF8 "foo\x{2013}bar"]
  CUR = 9
  LEN = 12
SV = PV(0x328dcc) at 0x32b594
  REFCNT = 1
  FLAGS = (PADTMP,POK,READONLY,pPOK)
  PV = 0x1d6d50c "foo-bar"\0
  CUR = 7
  LEN = 12

0人赞添加讨论(0) 举报

爱情/是我丢掉的垃圾

4楼-- · 2019-05-26 04:55

Have you tried Test::LongString? Even though it's really a test module, it is handy for showing you where the differences in a string occur. It focuses on the parts that are different instead of showing you the whole string, and it make \x{} escapes for specials.

I'd like to see an example where utf8::all changed the behavior, even if just to see an interesting edge case.

0人赞添加讨论(0) 举报

该账号已被封号

5楼-- · 2019-05-26 04:57

All you need to dump out any string is:

printf "U+%v04X\n", $string;

You could use this to format a string:

($print_string = $string) =~ s/([^\x20-\x7E])/sprintf "\\x{%x}", $1/ge;

or even

use charnames ();
($print_string = $string) =~ s/([^\x20-\x7E])/sprintf "\\N{%s}", charnames::viacode(ord $1)/ge;

I have no idea why in the wolrd you would use the misleadingly named utf8::all. It’s not a core module, and you seem to be having some sort of trouble with knowing what it is really doing. If you explicitly used the individual core pieces that go into it, maybe you would understand it all better.

0人赞添加讨论(0) 举报

How can I dump a string in perl to see if there ar

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间