Matz wrote in his book that in order to use UTF-8, you must add a coding comment on the first line of your script. He gives us an example:
# -*- coding: utf-8 -*- # Specify Unicode UTF-8 characters
# This is a string literal containing a multibyte multiplication character
s = "2x2=4"
# The string contains 6 bytes which encode 5 characters
s.length # => 5: Characters: '2' 'x' '2' '=' '4'
s.bytesize # => 6: Bytes (hex): 32 c3 97 32 3d 34
When he invokes bytesize
, it returns 6
since the multiplication symbol ×
is outside the ascii set, and must be represented by unicode with the two bytes.
I tried the exercise and without specifying the coding comment, it recognized the multiplication symbol as two bytes:
'×'.encoding
=> #<Encoding:UTF-8>
'×'.bytes.to_a.map {|dec| dec.to_s(16) }
=> ["c3", "97"]
So it appears utf-8 is the default encoding. Is this a recent addition to Ruby 2? His examples were from Ruby 1.9.