What is the difference between #encode and #force_

2020-06-07 04:46发布

问题:

I really do not understand the difference between #encode and #force_encoding in Ruby for the String class. I understand that "kam".force_encoding("UTF-8") will force "kam" to be in UTF-8 encoding, but how is #encode(encoding) different?

http://ruby-doc.org/core-2.0/String.html#method-i-encoding

回答1:

Difference is pretty big. force_encoding sets given string encoding, but does not change the string itself, i.e. does not change it representation in memory:

'łał'.bytes #=> [197, 130, 97, 197, 130]
'łał'.force_encoding('ASCII').bytes #=> [197, 130, 97, 197, 130]
'łał'.force_encoding('ASCII')   #=> "\xC5\x82a\xC5\x82"

Encode is assuming that the current encoding is correct and tries to change the string so it reads same way in second encoding:

'łał'.encode('UTF-16') #=> 'łał'
'łał'.encode('UTF-16').bytes #=> [254, 255, 1, 65, 0, 97, 1, 66] 

In short, force_encoding changes the way string is being read from bytes, and encode changes the way string is written without changing the output (if possible)



回答2:

Read this Changing an encoding

The associated Encoding of a String can be changed in two different ways.

First, it is possible to set the Encoding of a string to a new Encoding without changing the internal byte representation of the string, with String#force_encoding. This is how you can tell Ruby the correct encoding of a string.

Example :

string = "R\xC3\xA9sum\xC3\xA9"
string.encoding #=> #<Encoding:ISO-8859-1>
string.force_encoding(Encoding::UTF_8) #=> "R\u00E9sum\u00E9"

Second, it is possible to transcode a string, i.e. translate its internal byte representation to another encoding. Its associated encoding is also set to the other encoding. See String#encode for the various forms of transcoding, and the Encoding::Converter class for additional control over the transcoding process.

Example :

string = "R\u00E9sum\u00E9"
string.encoding #=> #<Encoding:UTF-8>
string = string.encode!(Encoding::ISO_8859_1)
#=> "R\xE9sum\xE9"
string.encoding
#=> #<Encoding::ISO-8859-1>


标签: ruby encoding