Trying to understand the Ruby .chr and .ord method

I've been working with the Ruby chr and ord methods recently and there are a few things I don't understand.

My current project involves converting individual characters to and from ordinal values. As I understand it, if I have a string with an individual character like "A" and I call ord on it I get its position on the ASCII table which is 65. Calling the inverse, 65.chr gives me the character value "A", so this tells me that Ruby has a collection somewhere of ordered character values, and it can use this collection to give me the position of a specific character, or the character at a specific position. I may be wrong on this, please correct me if I am.

Now I also understand that Ruby's default character encoding uses UTF-8 so it can work with thousands of possible characters. Thus if I ask it for something like this:

'好'.ord

I get the position of that character which is 22909. However, if I call chr on that value:

22909.chr

I get "RangeError: 22909 out of char range." I'm only able to get char to work on values up to 255 which is extended ASCII. So my questions are:

Why does Ruby seem to be getting values for chr from the extended ASCII character set but ord from UTF-8?
Is there any way to tell Ruby to use different encodings when it uses these methods? For instance, tell it to use ASCII-8BIT encoding instead of whatever it's defaulting to?
If it is possible to change the default encoding, is there any way of getting the total number of characters available in the set being used?

标签： ruby encoding

2条回答

啃猪蹄的小仙女

2楼-- · 2019-04-07 10:49

According to Integer#chr you can use the following to force the encoding to be UTF_8.

22909.chr(Encoding::UTF_8)
#=> "好"

To list all available encoding names

Encoding.name_list
#=> ["ASCII-8BIT", "UTF-8", "US-ASCII", "UTF-16BE", "UTF-16LE", "UTF-32BE", "UTF-32LE", "UTF-16", "UTF-32", ...]

A hacky way to get the maximum number of characters

2000000.times.reduce(0) do |x, i|
  begin
    i.chr(Encoding::UTF_8)
    x += 1
  rescue
  end

  x
end
#=> 1112064

0人赞添加讨论(0) 举报

小情绪 Triste *

3楼-- · 2019-04-07 10:58

After tooling around with this for a while, I realized that I could get the max number of characters for each encoding by running a binary search to find the highest value that doesn't throw a RangeError.

def get_highest_value(set)
  max = 10000000000
  min = 0
  guess = 5000000000

  while true
    begin guess.chr(set)
      if (min > max)
        return max
      else
        min = guess + 1
        guess = (max + min) / 2
      end
    rescue
      if min > max
        return max
      else
        max = guess - 1
        guess = (max + min) / 2
      end
    end
  end
end

The value input to the method is the name of the encoding being checked.

0人赞添加讨论(0) 举报

Trying to understand the Ruby .chr and .ord method

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间