Convenient way of obtaining the specific object be

2019-07-20 19:03发布

问题:

Here's an interesting one, I have a scenario in a bucket sharding system I'm writing where I maintain index hashes and storage hashes, the interrelation is a UUID generated because this is distributed and I want some confidence that new buckets gain unique references.

Early on in this exercise I started optimising the code to freeze all keys generated by SecureRandom.uuid (it produces strings) because when you use a string as a key in a hash gets duped and frozen automatically to ensure that it can't be changed. (if it's a String and not frozen).

In most cases it's easy to aggressively do this, particularly for new UUIDs (actually in my project many such values need this treatment) but in some cases I find I'm having to approach a hash with a value passed over the network and obtain then, to ensure consistent use of any strings present as keys, use a rather obtuse lookup mechanism.

My goal in this, since I want this to maintain a huge data set across multiple nodes, to reduce the overhead of key and index storage as much as possible and because it's a bucketing system the same UUID can be referenced many times and as such it's helpful to use the same reference.

Here's some code that demonstrates the issue in a simpl(ish) form. I'm just asking if there's a more optimum and convenient mechanism for obtaining any pre-existing object reference for a key that has the same string value (for the key name and not the value associated).

# Demonstrate the issue..

require 'securerandom'

index = Hash.new
store = Hash.new

key = 'meh'
value = 1

uuid = SecureRandom.uuid

puts "Ruby dups and freezes strings if used for keys in hashes"
puts "This produces different IDs"
store[uuid] = value
index[key] = uuid
store.each_key { |x| puts "Store reference for value of #{x} #{x.object_id}"}
index.each_value { |x| puts "Index reference for #{x} #{x.object_id}" }
puts

puts "If inconsistencies in ID occur then Ruby attempts to preserve the use of the frozen key so if it happens in one area take care"
puts "This produces different IDs"
uuid = uuid.freeze
store[uuid] = value
index[key] = uuid
store.each_key { |x| puts "Store reference for value of #{x} #{x.object_id}"}
index.each_value { |x| puts "Index reference for #{x} #{x.object_id}" }
puts

puts "If you start with a clean slate and a frozen key you can overcome it if you freeze the string before use"
puts "This is clean so far and produces the same object"
index = Hash.new
store = Hash.new

store[uuid] = value
index[key] = uuid
store.each_key { |x| puts "Store reference for value of #{x} #{x.object_id}"}
index.each_value { |x| puts "Index reference for #{x} #{x.object_id}" }
puts

puts "But if the same value for the key comes in (possibly remote) then it becomes awkward"
puts "This produces different IDs"
uuid = uuid.dup.freeze
store[uuid] = value
index[key] = uuid
store.each_key { |x| puts "Store reference for value of #{x} #{x.object_id}"}
index.each_value { |x| puts "Index reference for #{x} #{x.object_id}" }
puts

puts "So you get into oddities like this to ensure you standarise values put in to keys that already exist"
puts "This cleans up and produces same IDs but is a little awkward"

uuid = uuid.dup.freeze
uuid_list = store.keys
uuid = uuid_list[uuid_list.index(uuid)] if uuid_list.include?(uuid)
store[uuid] = value
index[key] = uuid
store.each_key { |x| puts "Store reference for value of #{x} #{x.object_id}"}
index.each_value { |x| puts "Index reference for #{x} #{x.object_id}" }
puts

Example run...

Ruby dups and freezes strings if used for keys in hashes
This produces different IDs
Store reference for value of bd48a581-95e9-452e-b8a3-602d92d47011 70209306325780
Index reference for bd48a581-95e9-452e-b8a3-602d92d47011 70209306325880

If inconsistencies in ID occur then Ruby attempts to preserve the use of the frozen key so if it happens in one area take care
This produces different IDs
Store reference for value of bd48a581-95e9-452e-b8a3-602d92d47011 70209306325780
Index reference for bd48a581-95e9-452e-b8a3-602d92d47011 70209306325880

If you start with a clean slate and a frozen key you can overcome it if you freeze the string before use
This is clean so far and produces the same object
Store reference for value of bd48a581-95e9-452e-b8a3-602d92d47011 70209306325880
Index reference for bd48a581-95e9-452e-b8a3-602d92d47011 70209306325880

But if the same value for the key comes in (possibly remote) then it becomes awkward
This produces different IDs
Store reference for value of bd48a581-95e9-452e-b8a3-602d92d47011 70209306325880
Index reference for bd48a581-95e9-452e-b8a3-602d92d47011 70209306325000

So you get into oddities like this to ensure you standarise values put in to keys that already exist
This cleans up and produces same IDs but is a little awkward
Store reference for value of bd48a581-95e9-452e-b8a3-602d92d47011 70209306325880
Index reference for bd48a581-95e9-452e-b8a3-602d92d47011 70209306325880

回答1:

It seems, for a pure Ruby example, this can be avoided entirely due to the global nature of symbol object references. It's enough to convert strings to symbols to ensure the same reference. It's not what I was hoping for since I use Ruby to prototype for C developers sometimes but it works reliably and is suitable to help my prototype progress with a lot of additional comment for C development stage.

I would still be interested in other examples but here's a big thumbsup for Symbols although I tend to avoid them in many network cases because they marshal to String through JSON (and I like JSON since peers written in different languages can usually support it).

imac:Ruby andrews$ irb
irb(main):001:0> a = :meh
=> :meh
irb(main):002:0> b = 'meh'.to_sym
=> :meh
irb(main):003:0> a.object_id == b.object_id
=> true

Additional backup here on this approach Why use symbols as hash keys in Ruby?

In addition, need to remember that symbols, once named, aren't garbage collected.



回答2:

Maybe you are looking for Enumerable#find

uuid = store.find{|k,_| k == uuid_from_network }.first

Full example:

require 'securerandom'

index = Hash.new
store = Hash.new
key = 'meh'
value = 1
uuid = SecureRandom.uuid
store[uuid] = value
index[key] = uuid

# obtained from elsewhere
uuid = uuid.dup.freeze

uuid = store.find{|k,_| k == uuid }.first
store[uuid] = value
index[key] = uuid
store.each_key { |x| puts "Store reference for value of #{x} #{x.object_id}"}
index.each_value { |x| puts "Index reference for #{x} #{x.object_id}" }

Output:

Store reference for value of d94390c4-7cc7-4e94-92bc-a0dd862ac6a2 70190385847520
Index reference for d94390c4-7cc7-4e94-92bc-a0dd862ac6a2 70190385847520

If you want to go crazy efficient, you can build a lightweight wrapper around the C function st_get_key, which does exactly what you want. I took the implementation of Hash#has_key? as boilerplate. You can mix C code into Ruby code for example with RubyInline.

require 'inline'

class Hash
  inline do |builder|
    builder.c <<-EOS
      VALUE fetch_key(VALUE key) {
        st_data_t result;
        if (!RHASH(self)->ntbl)
          return Qnil;
        if (st_get_key(RHASH(self)->ntbl, key, &result)) {
          return result;
        }
        return Qnil;
      }
    EOS
  end
end


回答3:

I couldn't find anything native in the Hash source and Symbols were unsuitable for my purposes so I adapted the answer from @p11y, thanks ^^

class Hash

  def consistent_key_obj(key)
    self.keys.include?(key) ? self.find{|k,_| k == key }.first : key
  end

end


标签: ruby hash key