Convert UTF-8 character sequence to real UTF-8 byt

2019-08-12 07:19发布

I have a plain text-file (.yml) that contains UTF-8 character sequences like this:

foo: "Dette er en \xC3\xB8 "

The problem lies in \xC3\xB8 - These are not "real" UTF-8 bytes, since they are saved in the text file as 8 actual characters: \ x C 3 \ x B 8

Is there a way to get these converted into the real 2-bytes UTF-8 sequence?

Any OS / Language / Shell-tool may be used :-)

/ Carsten

标签： encoding utf-8 iconv

1条回答

2楼-- · 2019-08-12 07:42

Use this perl script to convert your file:

#!/usr/bin/perl
while (<STDIN>) {
  $_ =~ s/\\x([0-9A-F][0-9A-F])/chr(hex($1))/eg;
  print $_;
}

Let's assume you named a file with script as bogusutf, then do the conversion with this command:

$ perl bogusutf <inputfile >outputfile

0人赞添加讨论(0) 举报