Ruby: Want a Set-like object which preserves order

2019-04-06 09:35发布

问题:

... or alternatively an Array which prevents duplicate entries.

Is there some kind of object in Ruby which:

  • responds to [], []= and <<
  • silently drops duplicate entries
  • is Enumerable (or at least supports find_all)
  • preserves the order in which entries were inserted
?

As far as I can tell, an Array supports points 1, 3 and 4; while a Set supports 1, 2 and 3 (but not 4). And a SortedSet won't do, because my entries don't implement <=>.

回答1:

There isn't one as far as I know, and Set by its mathematical nature is meant to be unordered (or at least, implementationally, meant not to guarantee order - in fact its usually implemented as a hash table so it does mess up order).

However, it's not hard to either extend array directly or subclass it to do this. I just tried it out and this works:

class UniqueArray < Array
  def initialize(*args)
    if args.size == 1 and args[0].is_a? Array then
      super(args[0].uniq)
    else
      super(*args)
    end
  end

  def insert(i, v)
    super(i, v) unless include?(v)
  end

  def <<(v)
    super(v) unless include?(v)
  end

  def []=(*args)
    # note: could just call super(*args) then uniq!, but this is faster

    # there are three different versions of this call:
    # 1. start, length, value
    # 2. index, value
    # 3. range, value
    # We just need to get the value
    v = case args.size
      when 3 then args[2]
      when 2 then args[1]
      else nil
    end

    super(*args) if v.nil? or not include?(v)
  end
end

Seems to cover all the bases. I used OReilly's handy Ruby Cookbook as a reference - they have a recipe for "Making sure a sorted array stays sorted" which is similar.



回答2:

As of Ruby 1.9, the built-in Hash object preserves insertion order. For example:

h = {}
h[:z] = 1
h[:b] = 2
h[:a] = 3
h[:x] = 0
p h.keys     #=> [:z, :b, :a, :x]

h.delete :b
p h.keys     #=> [:z, :a, :x]

h[:b] = 1
p h.keys     #=> [:z, :a, :x, :b]

So, you can set any value (like a simple true) for any key and you now have an ordered set. You can test for a key using either h.key?(obj) or, if you always set each key to have a truthy value, just h[obj]. To remove a key, use h.delete(obj). To convert the ordered set to an array, use h.keys.

Because the Ruby 1.9 Set library happens to be built upon a Hash currently, you can currently use Set as an ordered set. (For example, the to_a method's implementation is just @hash.keys.) Note, however, that this behavior is not guaranteed by that library, and might change in the future.

require 'set'
s = Set[ :f, :o, :o, :b, :a, :r ]  #=> #<Set: {:f, :o, :b, :a, :r}>
s << :z                            #=> #<Set: {:f, :o, :b, :a, :r, :z}>
s.delete :o                        #=> #<Set: {:f, :b, :a, :r, :z}>
s << :o                            #=> #<Set: {:f, :b, :a, :r, :z, :o}>
s << :o                            #=> #<Set: {:f, :b, :a, :r, :z, :o}>
s << :f                            #=> #<Set: {:f, :b, :a, :r, :z, :o}>
s.to_a                             #=> [:f, :b, :a, :r, :z, :o]


回答3:

I like this solution although it requires active_support's OrderedHash

require 'active_support/ordered_hash'

class OrderedSet < Set

  def initialize enum = nil, &block
    @hash = ActiveSupport::OrderedHash.new
    super
  end

end

=)



回答4:

You could use a Hash to store the values, and have an incrementing value stored in the value of each Hash pair. Then you can access the set in a sorted manner, albeit slowly, by accessing the objects via their values.

I'll try to add some code in here later to explain further.

I am aware accessing via values is much slower than by keys.

Update 1: In Ruby 1.9, Hash elements are iterated in their insertion order.



回答5:

Not that I know of, but it wouldn't be hard to roll your own. Just subclass Array and use a Set to maintain your uniqueness constraint.

One question about silent dropping. How would this affect #[]=? If I was trying to overwrite an existing entry with something which was already stored elsewhere, should it remove the would-be-removed element anyway? I think either way could provide nasty surprises down the road.



标签: ruby arrays set