Perl Encode.pm cannot decode string with wide char

2019-03-24 16:14发布

问题:

I was running a perl app which uses /opt/local/lib/perl5/5.12.4/darwin-thread-multi-2level/Encode.pm

and issues an error

Cannot decode string with wide characters at /opt/local/lib/perl5/5.12.4/darwin-thread-multi-2level/Encode.pm line 174.

Line 174 of Encode.pm reads

sub decode($$;$) {
    my ( $name, $octets, $check ) = @_;
    return undef unless defined $octets;
    $octets .= '' if ref $octets;
    $check ||= 0;
    my $enc = find_encoding($name);
    unless ( defined $enc ) {
        require Carp;
        Carp::croak("Unknown encoding '$name'");
    }
    my $string = $enc->decode( $octets, $check );  # line 174
    $_[1] = $octets if $check and !ref $check and !( $check & LEAVE_SRC() );
    return $string;
}

Any workaround?

回答1:

I had a similar problem. $enc->decode( $octets, $check ); expects octets.

So put Encode::_utf8_off($octets) before. It made it work for me.



回答2:

encode takes a string of Unicode code points and serialises them into a string of bytes.

decode takes a string of bytes and deserialises them into Unicode code points.

That message means you passed a string containing one or more characters above 255 (non-bytes) to decode, which is obviously an incorrect argument.

>perl -MEncode -E"for (254..257) { say; decode('iso-8859-1', chr($_)); }"
254
255
256
Wide character in subroutine entry at .../Encode.pm line 176.

You ask for a workaround, but the bug is yours. Perhaps you are accidentally trying to decode something you already decoded?



回答3:

That error message is saying that you have passed in a string that has already been decoded (and contains characters above codepoint 255). You can't decode it again.