I'm crawling a website and collecting information from its JSON. The results are saved in a hash. But some of the pages give me "malformed UTF-8 character in JSON string" error. I notice that the the last letter in "cafe" will produce error. I think it is because of the mix of character types. So now I'm looking for a way to convert all types of character to utf-8 (hope there is a way perfect like that). I tried utf8::all, it just doesn't work (maybe I didn't do it right). I'm a noob. Please help, thanks.
UPDATA
Well, after I read the article "Know the difference between character strings and UTF-8 strings" Posted by brian d foy. I solve the problem with the codes:
use utf8;
use Encode qw(encode_utf8);
use JSON;
my $json_data = qq( { "cat" : "Büster" } );
$json_data = encode_utf8( $json_data );
my $perl_hash = decode_json( $json_data );
Hope this help some one else.
decode_json
expects the JSON to have been encoded using UTF-8.While your source file is encoded using UTF-8, you have Perl decode it by using
use utf8;
(as you should). This means your string contains Unicode characters, not the UTF-8 bytes that represent those characters.As you've shown, you could encode the string before passing it to
decode_json
.But you could simply tell JSON that the string is already decoded.