Convert an array into an index hash in Ruby

2019-03-08 05:07发布

I have an array, and I want to make a hash so I can quickly ask "is X in the array?".

In perl, there is an easy (and fast) way to do this:

my @array = qw( 1 2 3 );
my %hash;
@hash{@array} = undef;

This generates a hash that looks like:

{
    1 => undef,
    2 => undef,
    3 => undef,
}

The best I've come up with in Ruby is:

array = [1, 2, 3]
hash = Hash[array.map {|x| [x, nil]}]

which gives:

{1=>nil, 2=>nil, 3=>nil}

Is there a better Ruby way?

EDIT 1

No, Array.include? is not a good idea. Its slow. It does a query in O(n) instead of O(1). My example array had three elements for brevity; assume the actual one has a million elements. Let's do a little benchmarking:

#!/usr/bin/ruby -w
require 'benchmark'

array = (1..1_000_000).to_a
hash = Hash[array.map {|x| [x, nil]}]

Benchmark.bm(15) do |x|
    x.report("Array.include?") { 1000.times { array.include?(500_000) } }
    x.report("Hash.include?") { 1000.times { hash.include?(500_000) } }
end

Produces:

                     user     system      total        real
Array.include?  46.190000   0.160000  46.350000 ( 46.593477)
Hash.include?    0.000000   0.000000   0.000000 (  0.000523)

标签: ruby arrays hash
14条回答
Melony?
2楼-- · 2019-03-08 05:52

If you want to quickly ask "is X in the array?" you should use Array#include?.

Edit (in response to addition in OP):

If you want speedy look up times, use a Set. Having a Hash that points to all nils is silly. Conversion is an easy process too with Array#to_set.

require 'benchmark'
require 'set'

array = (1..1_000_000).to_a
set = array.to_set

Benchmark.bm(15) do |x|
    x.report("Array.include?") { 1000.times { array.include?(500_000) } }
    x.report("Set.include?") { 1000.times { set.include?(500_000) } }
end

Results on my machine:

                     user     system      total        real
Array.include?  36.200000   0.140000  36.340000 ( 36.740605)
Set.include?     0.000000   0.000000   0.000000 (  0.000515)

You should consider just using a set to begin with, instead of an array so that a conversion is never necessary.

查看更多
Viruses.
3楼-- · 2019-03-08 05:54

You can do this very handy trick:

Hash[*[1, 2, 3, 4].map {|k| [k, nil]}.flatten]
=> {1=>nil, 2=>nil, 3=>nil, 4=>nil}
查看更多
Explosion°爆炸
4楼-- · 2019-03-08 05:58

Doing some benchmarking on the suggestions so far gives that chrismear and Gaius's assignment-based hash creation is slightly faster than my map method (and assigning nil is slightly faster than assigning true). mtyaka and rampion's Set suggestion is about 35% slower to create.

As far as lookups, hash.include?(x) is a very tiny amount faster than hash[x]; both are twice as a fast as set.include?(x).

                user     system      total        real
chrismear   6.050000   0.850000   6.900000 (  6.959355)
derobert    6.010000   1.060000   7.070000 (  7.113237)
Gaius       6.210000   0.810000   7.020000 (  7.049815)
mtyaka      8.750000   1.190000   9.940000 (  9.967548)
rampion     8.700000   1.210000   9.910000 (  9.962281)

                user     system      total        real
times      10.880000   0.000000  10.880000 ( 10.921315)
set        93.030000  17.490000 110.520000 (110.817044)
hash-i     45.820000   8.040000  53.860000 ( 53.981141)
hash-e     47.070000   8.280000  55.350000 ( 55.487760)

Benchmarking code is:

#!/usr/bin/ruby -w
require 'benchmark'
require 'set'

array = (1..5_000_000).to_a

Benchmark.bmbm(10) do |bm|
    bm.report('chrismear') { hash = {}; array.each{|x| hash[x] = nil} }
    bm.report('derobert')  { hash = Hash[array.map {|x| [x, nil]}] }
    bm.report('Gaius')     { hash = {}; array.each{|x| hash[x] = true} }
    bm.report('mtyaka')    { set = array.to_set }
    bm.report('rampion')   { set = Set.new(array) }
end

hash = Hash[array.map {|x| [x, true]}]
set = array.to_set
array = nil
GC.start

GC.disable
Benchmark.bmbm(10) do |bm|
    bm.report('times')  { 100_000_000.times { } }
    bm.report('set')    { 100_000_000.times { set.include?(500_000) } }
    bm.report('hash-i') { 100_000_000.times { hash.include?(500_000) } }
    bm.report('hash-e') { 100_000_000.times { hash[500_000] } }
end
GC.enable
查看更多
Summer. ? 凉城
5楼-- · 2019-03-08 05:58

This preserves 0's if your hash was [0,0,0,1,0]

  hash = {}
  arr.each_with_index{|el, idx| hash.merge!({(idx + 1 )=> el }) }

Returns :

  # {1=>0, 2=>0, 3=>0, 4=>1, 5=>0}
查看更多
对你真心纯属浪费
6楼-- · 2019-03-08 06:00

I think chrismear's point on using assignment over creation is great. To make the whole thing a little more Ruby-esque, though, I might suggest assigning something other than nil to each element:

hash = {}
array.each { |x| hash[x] = 1 } # or true or something else "truthy"
...
if hash[376]                   # instead of if hash.has_key?(376)
  ...
end

The problem with assigning to nil is that you have to use has_key? instead of [], since [] give you nil (your marker value) if the Hash doesn't have the specified key. You could get around this by using a different default value, but why go through the extra work?

# much less elegant than above:
hash = Hash.new(42)
array.each { |x| hash[x] = nil }
...
unless hash[376]
  ...
end
查看更多
Viruses.
7楼-- · 2019-03-08 06:03

I'm fairly certain that there isn't a one-shot clever way to construct this hash. My inclination would be to just be explicit and state what I'm doing:

hash = {}
array.each{|x| hash[x] = nil}

It doesn't look particularly elegant, but it's clear, and does the job.

FWIW, your original suggestion (under Ruby 1.8.6 at least) doesn't seem to work. I get an "ArgumentError: odd number of arguments for Hash" error. Hash.[] expects a literal, even-lengthed list of values:

Hash[a, 1, b, 2] # => {a => 1, b => 2}

so I tried changing your code to:

hash = Hash[*array.map {|x| [x, nil]}.flatten]

but the performance is dire:

#!/usr/bin/ruby -w
require 'benchmark'

array = (1..100_000).to_a

Benchmark.bm(15) do |x|
  x.report("assignment loop") {hash = {}; array.each{|e| hash[e] = nil}}
  x.report("hash constructor") {hash = Hash[*array.map {|e| [e, nil]}.flatten]}
end

gives

                     user     system      total        real
assignment loop  0.440000   0.200000   0.640000 (  0.657287)
hash constructor  4.440000   0.250000   4.690000 (  4.758663)

Unless I'm missing something here, a simple assignment loop seems the clearest and most efficient way to construct this hash.

查看更多
登录 后发表回答