Check if two hashes have the same set of keys

2楼-- · 2019-04-28 21:52

Here is my solution:

class Hash
    # doesn't check recursively
    def same_keys?(compare)
        if compare.class == Hash
            if self.size == compare.size
               self.keys.all? {|s| compare.key?(s)}
            else
                return false
            end
        else
            nil
        end
    end
end

a = c = {  a: nil,    b: "whatever1",  c: 1.14,     d: true   }
b     = {  a: "foo",  b: "whatever2",  c: 2.14,   "d": false  }
d     = {  a: "bar",  b: "whatever3",  c: 3.14,               }

puts a.same_keys?(b)                    # => true
puts a.same_keys?(c)                    # => true
puts a.same_keys?(d)                    # => false   
puts a.same_keys?(false).inspect        # => nil
puts a.same_keys?("jack").inspect       # => nil
puts a.same_keys?({}).inspect           # => false

0人赞添加讨论(0) 举报

你好瞎i

3楼-- · 2019-04-28 21:56

Just for the sake of having at least a benchmark on this question...

require 'securerandom'
require 'benchmark'

a = {}
b = {}

# Use uuid to get a unique random key
(0..1_000).each do |i|
  key = SecureRandom.uuid
  a[key] = i
  b[key] = i
end

Benchmark.bmbm do |x|
  x.report("#-") do
    1_000.times do
      (a.keys - b.keys).empty? and (a.keys - b.keys).empty?
    end
  end

  x.report("#&") do
    1_000.times do
      computed = a.keys & b.keys
      computed.size == a.size
    end
  end

  x.report("#all?") do
    1_000.times do
      a.keys.all?{ |key| !!b[key] }
    end
  end

  x.report("#sort") do
    1_000.times do
      a_sorted = a.keys.sort
      b_sorted = b.keys.sort
      a == b
    end
  end
end

Results are:

Rehearsal -----------------------------------------
#-      1.000000   0.000000   1.000000 (  1.001348)
#&      0.560000   0.000000   0.560000 (  0.563523)
#all?   0.240000   0.000000   0.240000 (  0.239058)
#sort   0.850000   0.010000   0.860000 (  0.854839)
-------------------------------- total: 2.660000sec

            user     system      total        real
#-      0.980000   0.000000   0.980000 (  0.976698)
#&      0.560000   0.000000   0.560000 (  0.559592)
#all?   0.250000   0.000000   0.250000 (  0.251128)
#sort   0.860000   0.000000   0.860000 (  0.862857)

I have to agree with @akuhn that this would be a better benchmark if we had more information on the dataset you are using. But that being said, I believe this question really needed some hard fact.

0人赞添加讨论(0) 举报

爱情/是我丢掉的垃圾

4楼-- · 2019-04-28 21:58

Try:

# Check that both hash have the same number of entries first before anything
if h1.size == h2.size
    # breaks from iteration and returns 'false' as soon as there is a mismatched key
    # otherwise returns true
    h1.keys.all?{ |key| !!h2[key] }
end

Enumerable#all?

worse case scenario, you'd only be iterating through the keys once.

0人赞添加讨论(0) 举报

甜甜的少女心

5楼-- · 2019-04-28 22:01

Combining freemasonjson's and sawa's ideas:

h1.size == h2.size and (h1.keys - h2.keys).empty?

0人赞添加讨论(0) 举报

趁早两清

6楼-- · 2019-04-28 22:01

It depends on your data.

There is no general case really. For example, generally retrieving the entire keyset at once is faster than checking inclusion of each key seperately. However, if in your dataset, the keysets differ more often than not, then a slower solution which fails faster might be faster. For example:

h1.size == h2.size and h1.keys.all?{|k|h2.include?(k)}

Another factor to consider is the size of your hashes. If they are big a solution with higher setup cost, like calling Set.new, might pay off, if however they are small, it won't:

h1.size == h2.size and Set.new(h1.keys) == Set.new(h2.keys)

And if you happen to compare the same immutable hashes over and over again, it would definitely pay off to cache the results.

Eventually only a benchmark will tell, but, to write a benchmark, we'd need to know more about your use case. For sure, testing a solution with synthetic data (as for example, randomly generated keys) will not be representative.

0人赞添加讨论(0) 举报

再贱就再见

7楼-- · 2019-04-28 22:08

Alright, let's break all rules of savoir vivre and portability. MRI's C API comes into play.

/* Name this file superhash.c. An appropriate Makefile is attached below. */
#include <ruby/ruby.h>

static int key_is_in_other(VALUE key, VALUE val, VALUE data) {
  struct st_table *other = ((struct st_table**) data)[0];
  if (st_lookup(other, key, 0)) {
    return ST_CONTINUE;
  } else {
    int *failed = ((int**) data)[1];
    *failed = 1;
    return ST_STOP;
  }
}

static VALUE hash_size(VALUE hash) {
  if (!RHASH(hash)->ntbl)
    return INT2FIX(0);
  return INT2FIX(RHASH(hash)->ntbl->num_entries);
}

static VALUE same_keys(VALUE self, VALUE other) {
  if (CLASS_OF(other) != rb_cHash)
    rb_raise(rb_eArgError, "argument needs to be a hash");
  if (hash_size(self) != hash_size(other))
    return Qfalse;
  if (!RHASH(other)->ntbl && !RHASH(other)->ntbl)
    return Qtrue;
  int failed = 0;
  void *data[2] = { RHASH(other)->ntbl, &failed };
  rb_hash_foreach(self, key_is_in_other, (VALUE) data);
  return failed ? Qfalse : Qtrue;
}

void Init_superhash(void) {
  rb_define_method(rb_cHash, "same_keys?", same_keys, 1);
}

Here's a Makefile.

CFLAGS=-std=c99 -O2 -Wall -fPIC $(shell pkg-config ruby-1.9 --cflags)
LDFLAGS=-Wl,-O1,--as-needed $(shell pkg-config ruby-1.9 --libs)
superhash.so: superhash.o
    $(LINK.c) -shared $^ -o $@

An artificial, synthetic and simplistic benchmark shows what follows.

require 'superhash'
require 'benchmark'
n = 100_000
h1 = h2 = {a:5, b:8, c:1, d:9}
Benchmark.bm do |b|
  # freemasonjson's state of the art.
  b.report { n.times { h1.size == h2.size and h1.keys.all? { |key| !!h2[key] }}}
  # This solution
  b.report { n.times { h1.same_keys? h2} }
end
#       user     system      total        real
#   0.310000   0.000000   0.310000 (  0.312249)
#   0.050000   0.000000   0.050000 (  0.051807)

0人赞添加讨论(0) 举报

Check if two hashes have the same set of keys

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间