Must UTF-8 binaries include /utf8 in the binary li

2019-05-07 01:48发布

问题:

In erlang, when defining a UTF-8 binary string, I need to specify the encoding in the binary literal, like this:

Star = <<"★"/utf8>>.
> <<226,152,133>>
io:format("~ts~n", [Star]).
> ★
> ok

But, if the /utf8 encoding is omitted, the unicode characters are not handled correctly:

Star1 = <<"★">>.
> <<5>>
io:format("~ts~n", [Star1]).
> ^E
> ok

Is there a way that I can create literal binary strings like this without having to specify /utf8 in every binary I create? My code has quite a few binaries like this and things have become quite cluttered. Is there a way to set some sort of default encoding for binaries?

回答1:

This is probably a result of the ambiguity of Erlang strings and lists. When you enter <<"★">>, what Erlang is actually seeing is <<[9733]>>, which, of course, is just a list containing an integer. As such, I believe Erlang in this case would encode 9733 as an integer, most likely with 16-bits (though I could certainly be wrong on that).

The /utf8 flag indicates to Erlang that this is supposed to be a UTF8 string, and thus gives a hint to the VM about how best to encode the integer it encounters.