Here's an interesting one, I have a scenario in a bucket sharding system I'm writing where I maintain index hashes and storage hashes, the interrelation is a UUID generated because this is distributed and I want some confidence that new buckets gain unique references.
Early on in this exercise I started optimising the code to freeze all keys generated by SecureRandom.uuid (it produces strings) because when you use a string as a key in a hash gets duped and frozen automatically to ensure that it can't be changed. (if it's a String and not frozen).
In most cases it's easy to aggressively do this, particularly for new UUIDs (actually in my project many such values need this treatment) but in some cases I find I'm having to approach a hash with a value passed over the network and obtain then, to ensure consistent use of any strings present as keys, use a rather obtuse lookup mechanism.
My goal in this, since I want this to maintain a huge data set across multiple nodes, to reduce the overhead of key and index storage as much as possible and because it's a bucketing system the same UUID can be referenced many times and as such it's helpful to use the same reference.
Here's some code that demonstrates the issue in a simpl(ish) form. I'm just asking if there's a more optimum and convenient mechanism for obtaining any pre-existing object reference for a key that has the same string value (for the key name and not the value associated).
# Demonstrate the issue..
require 'securerandom'
index = Hash.new
store = Hash.new
key = 'meh'
value = 1
uuid = SecureRandom.uuid
puts "Ruby dups and freezes strings if used for keys in hashes"
puts "This produces different IDs"
store[uuid] = value
index[key] = uuid
store.each_key { |x| puts "Store reference for value of #{x} #{x.object_id}"}
index.each_value { |x| puts "Index reference for #{x} #{x.object_id}" }
puts
puts "If inconsistencies in ID occur then Ruby attempts to preserve the use of the frozen key so if it happens in one area take care"
puts "This produces different IDs"
uuid = uuid.freeze
store[uuid] = value
index[key] = uuid
store.each_key { |x| puts "Store reference for value of #{x} #{x.object_id}"}
index.each_value { |x| puts "Index reference for #{x} #{x.object_id}" }
puts
puts "If you start with a clean slate and a frozen key you can overcome it if you freeze the string before use"
puts "This is clean so far and produces the same object"
index = Hash.new
store = Hash.new
store[uuid] = value
index[key] = uuid
store.each_key { |x| puts "Store reference for value of #{x} #{x.object_id}"}
index.each_value { |x| puts "Index reference for #{x} #{x.object_id}" }
puts
puts "But if the same value for the key comes in (possibly remote) then it becomes awkward"
puts "This produces different IDs"
uuid = uuid.dup.freeze
store[uuid] = value
index[key] = uuid
store.each_key { |x| puts "Store reference for value of #{x} #{x.object_id}"}
index.each_value { |x| puts "Index reference for #{x} #{x.object_id}" }
puts
puts "So you get into oddities like this to ensure you standarise values put in to keys that already exist"
puts "This cleans up and produces same IDs but is a little awkward"
uuid = uuid.dup.freeze
uuid_list = store.keys
uuid = uuid_list[uuid_list.index(uuid)] if uuid_list.include?(uuid)
store[uuid] = value
index[key] = uuid
store.each_key { |x| puts "Store reference for value of #{x} #{x.object_id}"}
index.each_value { |x| puts "Index reference for #{x} #{x.object_id}" }
puts
Example run...
Ruby dups and freezes strings if used for keys in hashes
This produces different IDs
Store reference for value of bd48a581-95e9-452e-b8a3-602d92d47011 70209306325780
Index reference for bd48a581-95e9-452e-b8a3-602d92d47011 70209306325880
If inconsistencies in ID occur then Ruby attempts to preserve the use of the frozen key so if it happens in one area take care
This produces different IDs
Store reference for value of bd48a581-95e9-452e-b8a3-602d92d47011 70209306325780
Index reference for bd48a581-95e9-452e-b8a3-602d92d47011 70209306325880
If you start with a clean slate and a frozen key you can overcome it if you freeze the string before use
This is clean so far and produces the same object
Store reference for value of bd48a581-95e9-452e-b8a3-602d92d47011 70209306325880
Index reference for bd48a581-95e9-452e-b8a3-602d92d47011 70209306325880
But if the same value for the key comes in (possibly remote) then it becomes awkward
This produces different IDs
Store reference for value of bd48a581-95e9-452e-b8a3-602d92d47011 70209306325880
Index reference for bd48a581-95e9-452e-b8a3-602d92d47011 70209306325000
So you get into oddities like this to ensure you standarise values put in to keys that already exist
This cleans up and produces same IDs but is a little awkward
Store reference for value of bd48a581-95e9-452e-b8a3-602d92d47011 70209306325880
Index reference for bd48a581-95e9-452e-b8a3-602d92d47011 70209306325880
I couldn't find anything native in the Hash source and Symbols were unsuitable for my purposes so I adapted the answer from @p11y, thanks ^^
Maybe you are looking for
Enumerable#find
Full example:
Output:
If you want to go crazy efficient, you can build a lightweight wrapper around the C function
st_get_key
, which does exactly what you want. I took the implementation ofHash#has_key?
as boilerplate. You can mix C code into Ruby code for example with RubyInline.It seems, for a pure Ruby example, this can be avoided entirely due to the global nature of symbol object references. It's enough to convert strings to symbols to ensure the same reference. It's not what I was hoping for since I use Ruby to prototype for C developers sometimes but it works reliably and is suitable to help my prototype progress with a lot of additional comment for C development stage.
I would still be interested in other examples but here's a big thumbsup for Symbols although I tend to avoid them in many network cases because they marshal to String through JSON (and I like JSON since peers written in different languages can usually support it).
Additional backup here on this approach Why use symbols as hash keys in Ruby?
In addition, need to remember that symbols, once named, aren't garbage collected.