Trying to understand the Ruby .chr and .ord method

2019-04-07 10:08发布

I've been working with the Ruby chr and ord methods recently and there are a few things I don't understand.

My current project involves converting individual characters to and from ordinal values. As I understand it, if I have a string with an individual character like "A" and I call ord on it I get its position on the ASCII table which is 65. Calling the inverse, 65.chr gives me the character value "A", so this tells me that Ruby has a collection somewhere of ordered character values, and it can use this collection to give me the position of a specific character, or the character at a specific position. I may be wrong on this, please correct me if I am.

Now I also understand that Ruby's default character encoding uses UTF-8 so it can work with thousands of possible characters. Thus if I ask it for something like this:

'好'.ord

I get the position of that character which is 22909. However, if I call chr on that value:

22909.chr

I get "RangeError: 22909 out of char range." I'm only able to get char to work on values up to 255 which is extended ASCII. So my questions are:

  • Why does Ruby seem to be getting values for chr from the extended ASCII character set but ord from UTF-8?
  • Is there any way to tell Ruby to use different encodings when it uses these methods? For instance, tell it to use ASCII-8BIT encoding instead of whatever it's defaulting to?
  • If it is possible to change the default encoding, is there any way of getting the total number of characters available in the set being used?

标签: ruby encoding
2条回答
啃猪蹄的小仙女
2楼-- · 2019-04-07 10:49

According to Integer#chr you can use the following to force the encoding to be UTF_8.

22909.chr(Encoding::UTF_8)
#=> "好"

To list all available encoding names

Encoding.name_list
#=> ["ASCII-8BIT", "UTF-8", "US-ASCII", "UTF-16BE", "UTF-16LE", "UTF-32BE", "UTF-32LE", "UTF-16", "UTF-32", ...]

A hacky way to get the maximum number of characters

2000000.times.reduce(0) do |x, i|
  begin
    i.chr(Encoding::UTF_8)
    x += 1
  rescue
  end

  x
end
#=> 1112064
查看更多
小情绪 Triste *
3楼-- · 2019-04-07 10:58

After tooling around with this for a while, I realized that I could get the max number of characters for each encoding by running a binary search to find the highest value that doesn't throw a RangeError.

def get_highest_value(set)
  max = 10000000000
  min = 0
  guess = 5000000000

  while true
    begin guess.chr(set)
      if (min > max)
        return max
      else
        min = guess + 1
        guess = (max + min) / 2
      end
    rescue
      if min > max
        return max
      else
        max = guess - 1
        guess = (max + min) / 2
      end
    end
  end
end

The value input to the method is the name of the encoding being checked.

查看更多
登录 后发表回答