I've been working with the Ruby chr
and ord
methods recently and there are a few things I don't understand.
My current project involves converting individual characters to and from ordinal values. As I understand it, if I have a string with an individual character like "A" and I call ord
on it I get its position on the ASCII table which is 65. Calling the inverse, 65.chr
gives me the character value "A", so this tells me that Ruby has a collection somewhere of ordered character values, and it can use this collection to give me the position of a specific character, or the character at a specific position. I may be wrong on this, please correct me if I am.
Now I also understand that Ruby's default character encoding uses UTF-8 so it can work with thousands of possible characters. Thus if I ask it for something like this:
'好'.ord
I get the position of that character which is 22909. However, if I call chr
on that value:
22909.chr
I get "RangeError: 22909 out of char range." I'm only able to get char
to work on values up to 255 which is extended ASCII. So my questions are:
- Why does Ruby seem to be getting values for
chr
from the extended ASCII character set butord
from UTF-8? - Is there any way to tell Ruby to use different encodings when it uses these methods? For instance, tell it to use ASCII-8BIT encoding instead of whatever it's defaulting to?
- If it is possible to change the default encoding, is there any way of getting the total number of characters available in the set being used?
According to
Integer#chr
you can use the following to force the encoding to be UTF_8.To list all available encoding names
A hacky way to get the maximum number of characters
After tooling around with this for a while, I realized that I could get the max number of characters for each encoding by running a binary search to find the highest value that doesn't throw a RangeError.
The value input to the method is the name of the encoding being checked.