Check if a value exists in an array in Ruby

2019-01-03 07:30发布

站内文章 / Ruby

23 0

三岁会撩人

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have a value 'Dog' and an array ['Cat', 'Dog', 'Bird'].

How do I check if it exists in the array without looping through it? Is there a simple way of checking if the value exists, nothing more?

回答1:

You're looking for include?:

>> ['Cat', 'Dog', 'Bird'].include? 'Dog'
=> true

回答2:

There is an in? method in ActiveSupport (part of Rails) since v3.1, as pointed out by @campaterson. So within Rails, or if you require 'active_support', you can write:

'Unicorn'.in?(['Cat', 'Dog', 'Bird']) # => false

OTOH, there is no in operator or #in? method in Ruby itself, even though it has been proposed before, in particular by Yusuke Endoh a top notch member of ruby-core.

As pointed out by others, the reverse method include? exists, for all Enumerables including Array, Hash, Set, Range:

['Cat', 'Dog', 'Bird'].include?('Unicorn') # => false

Note that if you have many values in your array, they will all be checked one after the other (i.e. O(n)), while that lookup for a hash will be constant time (i.e O(1)). So if you array is constant, for example, it is a good idea to use a Set instead. E.g:

require 'set'
ALLOWED_METHODS = Set[:to_s, :to_i, :upcase, :downcase
                       # etc
                     ]

def foo(what)
  raise "Not allowed" unless ALLOWED_METHODS.include?(what.to_sym)
  bar.send(what)
end

A quick test reveals that calling include? on a 10 element Set is about 3.5x faster than calling it on the equivalent Array (if the element is not found).

A final closing note: be wary when using include? on a Range, there are subtleties, so refer to the doc and compare with cover?...

回答3:

Try

['Cat', 'Dog', 'Bird'].include?('Dog')

回答4:

Use Enumerable#include:

a = %w/Cat Dog Bird/

a.include? 'Dog'

Or, if a number of tests are done,¹ you can get rid of the loop (that even include? has) and go from O(n) to O(1) with:

h = Hash[[a, a].transpose]
h['Dog']

^{1. I hope this is obvious but to head off objections: yes, for just a few lookups, the Hash[] and transpose ops dominate the profile and are each O(n) themselves.}

回答5:

If you want to check by a block, you could try any? or all?.

%w{ant bear cat}.any? {|word| word.length >= 3}   #=> true  
%w{ant bear cat}.any? {|word| word.length >= 4}   #=> true  
[ nil, true, 99 ].any?                            #=> true

Details are here: http://ruby-doc.org/core-1.9.3/Enumerable.html
My inspiration come from here: https://stackoverflow.com/a/10342734/576497

回答6:

Several answers suggest Array#include?, but there is one important caveat: Looking at the source, even Array#include? does perform looping:

rb_ary_includes(VALUE ary, VALUE item)
{
    long i;

    for (i=0; i<RARRAY_LEN(ary); i++) {
        if (rb_equal(RARRAY_AREF(ary, i), item)) {
            return Qtrue;
        }
    }
    return Qfalse;
}

The way to test the word presence without looping is by constructing a trie for your array. There are many trie implementations out there (google "ruby trie"). I will use rambling-trie in this example:

a = %w/cat dog bird/

require 'rambling-trie' # if necessary, gem install rambling-trie
trie = Rambling::Trie.create { |trie| a.each do |e| trie << e end }

And now we are ready to test the presence of various words in your array without looping over it, in O(log n) time, with same syntactic simplicity as Array#include?, using sublinear Trie#include?:

trie.include? 'bird' #=> true
trie.include? 'duck' #=> false

回答7:

Ruby has 11 methods to find elements in an array.

The preferred one is include?

Or for repeated access, creating a set and then calling include? or member?

Here are all of them,

array.include?(element) # preferred method
array.member?(element)
array.to_set.include?(element)
array.to_set.member?(element)
array.index(element) > 0
array.find_index(element) > 0
array.index { |each| each == element } > 0
array.find_index { |each| each == element } > 0
array.any? { |each| each == element }
array.find { |each| each == element } != nil
array.detect { |each| each == element } != nil

All of them return a trueish value if the element is present.

include? is the preferred method. It uses a C-language for loop internally that breaks when an element matches the internal rb_equal_opt/rb_equal functions. It cannot get much more efficient unless you create a set for repeated membership checks.

VALUE
rb_ary_includes(VALUE ary, VALUE item)
{
  long i;
  VALUE e;

  for (i=0; i<RARRAY_LEN(ary); i++) {
    e = RARRAY_AREF(ary, i);
    switch (rb_equal_opt(e, item)) {
      case Qundef:
        if (rb_equal(e, item)) return Qtrue;
        break;
      case Qtrue:
        return Qtrue;
    }
  }
  return Qfalse;
}

member? is not redefined in Array class and uses an unoptimized implementation from the Enumerable module that literally enumerate through all elements.

static VALUE
member_i(RB_BLOCK_CALL_FUNC_ARGLIST(iter, args))
{
  struct MEMO *memo = MEMO_CAST(args);

  if (rb_equal(rb_enum_values_pack(argc, argv), memo->v1)) {
    MEMO_V2_SET(memo, Qtrue);
    rb_iter_break();
  }
  return Qnil;
}

static VALUE
enum_member(VALUE obj, VALUE val)
{
  struct MEMO *memo = MEMO_NEW(val, Qfalse, 0);

  rb_block_call(obj, id_each, 0, 0, member_i, (VALUE)memo);
  return memo->v2;
}

Translated to Ruby code this does about the following

def member?(value)
  memo = [value, false, 0]
  each_with_object(memo) do |each, memo|
    if each == memo[0]
      memo[1] = true 
      break
    end
  memo[1]
end

Both include? and member? have O(n) time complexity since the both search the array for the first occurrence of the expected value.

We can use a set to get O(1) access time at the cost of having to create a hash representation of the array first. If you repeatedly check membership on the same array this initial investment can pay off quickly. Set is not implemented in C but as plain Ruby class, still the O(1) access time of the underlying @hash makes this worthwhile.

Here is the implementation of the Set class,

module Enumerable
  def to_set(klass = Set, *args, &block)
    klass.new(self, *args, &block)
  end
end

class Set
  def initialize(enum = nil, &block) # :yields: o
    @hash ||= Hash.new
    enum.nil? and return
    if block
      do_with_enum(enum) { |o| add(block[o]) }
    else
      merge(enum)
    end
  end

  def merge(enum)
    if enum.instance_of?(self.class)
      @hash.update(enum.instance_variable_get(:@hash))
    else
      do_with_enum(enum) { |o| add(o) }
    end
    self
  end

  def add(o)
    @hash[o] = true
    self
  end

  def include?(o)
    @hash.include?(o)
  end
  alias member? include?

  ...
end

As you can see the Set class just creates an internal @hash instance, maps all objects to true and then checks membership using Hash#include? which is implemented with O(1) access time in the Hash class.

I won't discuss the other 7 methods as they are all less efficient.

There are actually even more methods with O(n) complexity beyond the 11 listed above, but I decided to not list them since scan the entire array rather than breaking at the first match.

Don't use these,

# bad examples
array.grep(element).any? 
array.select { |each| each == element }.size > 0
...

回答8:

If you don't want to loop, there's no way to do it with Arrays. You should use a Set instead.

require 'set'
s = Set.new
100.times{|i| s << "foo#{i}"}
s.include?("foo99")
 => true
[1,2,3,4,5,6,7,8].to_set.include?(4) 
  => true

Sets work internally like hashes, so Ruby doesn't need to loop through the collection to find items, since as the name implies, it generates hashes of the keys and creates a memory map so that each hash point to a certain point in memory. The previous example done with a Hash:

fake_array = {}
100.times{|i| fake_array["foo#{i}"] = 1}
fake_array.has_key?("foo99")
  => true

The downside is that Sets and hash keys can only include unique items and if you add a lot of items, Ruby will have to rehash the whole thing after certain number of items to build a new map that suits a larger keyspace. For more about this, I recommend you watch MountainWest RubyConf 2014 - Big O in a Homemade Hash by Nathan Long

Here's a benchmark:

require 'benchmark'
require 'set'

array = []
set   = Set.new

10_000.times do |i|
  array << "foo#{i}"
  set   << "foo#{i}"
end

Benchmark.bm do |x|
  x.report("array") { 10_000.times { array.include?("foo9999") } }
  x.report("set  ") { 10_000.times { set.include?("foo9999")   } }
end

And the results:

      user     system      total        real
array  7.020000   0.000000   7.020000 (  7.031525)
set    0.010000   0.000000   0.010000 (  0.004816)

回答9:

This is another way to do this: use the Array#index method.

It returns the index of the first occurrence of the element in the array.

example:

a = ['cat','dog','horse']
if a.index('dog')
    puts "dog exists in the array"
end

index() can also take a block

for example

a = ['cat','dog','horse']
puts a.index {|x| x.match /o/}

here, return the index of the first word in the array that containing letter 'o'.

回答10:

There are multiple ways to accomplish this. A few of them are as follows:

a = [1,2,3,4,5]

2.in? a  #=> true

8.in? a #=> false

a.member? 1 #=> true

a.member? 8 #=> false

回答11:

Fun fact,

You can use * to check array membership in a case expressions.

case element
when *array 
  ...
else
  ...
end

Notice the little * in the when clause, this checks for membership in the array.

All the usual magic behavior of the splat operator applies, so for example if array is not actually an array but a single element it will match that element.

回答12:

This will tell you not only that it exists but also how many times it appears:

 a = ['Cat', 'Dog', 'Bird']
 a.count("Dog")
 #=> 1

回答13:

For what it's worth, The Ruby docs are an amazing resource for these kinds of questions.

I would also take note of the length of the array you're searching through. The include? method will run a linear search with O(n) complexity which can get pretty ugly depending on the size of the array.

If you're working with a large (sorted) array, I would consider writing a binary search algorithm which shouldn't be too difficult and has a worst case of O(log n).

Or if you're using Ruby 2.0, you can take advantage of bsearch.

回答14:

If you have on mind more values... you can try:

Example: if Cat and Dog exist in the array:

(['Cat','Dog','Bird'] & ['Cat','Dog'] ).size == 2   #or replace 2 with ['Cat','Dog].size

Instead of:

['Cat','Dog','Bird'].member?('Cat') and ['Cat','Dog','Bird'].include?('Dog')

Note: member? and include? are the same.

This can do the work in one line!

回答15:

There's the other way around, too!

Suppose the array is [ :edit, :update, :create, :show ] - well perhaps the entire seven deadly/restful sins :)

And further toy with the idea of pulling a valid action from some string - say

my brother would like me to update his profile

Solution

[ :edit, :update, :create, :show ].select{|v| v if "my brother would like me to update his profile".downcase =~ /[,|.| |]#{v.to_s}[,|.| |]/}

回答16:

If we want to not use include? this also works:

['cat','dog','horse'].select{ |x| x == 'dog' }.any?

回答17:

['Cat', 'Dog', 'Bird'].detect { |x| x == 'Dog'}
=> "Dog"
!['Cat', 'Dog', 'Bird'].detect { |x| x == 'Dog'}.nil?
=> true

回答18:

If you need to check multiples times for any key, convert arr to hash, and now check in O(1)

arr = ['Cat', 'Dog', 'Bird']
hash = arr.map {|x| [x,true]}.to_h
 => {"Cat"=>true, "Dog"=>true, "Bird"=>true}
hash["Dog"]
 => true
hash["Insect"]
 => false

Performance of Hash#has_key? versus Array#include?

Parameter              Hash#has_key?                 Array#include 

Time Complexity         O(1) operation                O(n) operation 

Access Type             Accesses Hash[key] if it      Iterates through each element
                        returns any value then        of the array till it
                        true is returned to the       finds the value in Array
                        Hash#has_key? call
                        call

For single time check using include? is fine

回答19:

How about this way?

['Cat', 'Dog', 'Bird'].index('Dog')

回答20:

if you don't want to use include? you can first wrap the element in an array and then check whether the wrapped element is equal to the intersection of the array and the wrapped element. This will return a boolean value based on equality.

def in_array?(array, item)
    item = [item] unless item.is_a?(Array)
    item == array & item
end

回答21:

Here is one more way to do this:

arr = ['Cat', 'Dog', 'Bird']
e = 'Dog'

present = arr.size != (arr - [e]).size

回答22:

array = [ 'Cat', 'Dog', 'Bird' ]
array.include?("Dog")

标签：