可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
arr
is array of strings, e.g.: ["hello", "world", "stack", "overflow", "hello", "again"]
.
What would be easy and elegant way to check if arr
has duplicates, and if yes, return one of them (no matter which).
Examples:
["A", "B", "C", "B", "A"] # => "A" or "B"
["A", "B", "C"] # => nil
回答1:
a = ["A", "B", "C", "B", "A"]
a.detect{ |e| a.count(e) > 1 }
回答2:
You can do this in a few ways, with the first option being the fastest:
ary = ["A", "B", "C", "B", "A"]
ary.group_by{ |e| e }.select { |k, v| v.size > 1 }.map(&:first)
ary.sort.chunk{ |e| e }.select { |e, chunk| chunk.size > 1 }.map(&:first)
And a O(N^2) option (i.e. less efficient):
ary.select{ |e| ary.count(e) > 1 }.uniq
回答3:
Simply find the first instance where the index of the object (counting from the left) does not equal the index of the object (counting from the right).
arr.detect {|e| arr.rindex(e) != arr.index(e) }
If there are no duplicates, the return value will be nil.
I believe this is the fastest solution posted in the thread so far, as well, since it doesn't rely on the creation of additional objects, and #index
and #rindex
are implemented in C. The big-O runtime is N^2 and thus slower than Sergio's, but the wall time could be much faster due to the the fact that the "slow" parts run in C.
回答4:
detect
only finds one duplicate. find_all
will find them all:
a = ["A", "B", "C", "B", "A"]
a.find_all { |e| a.count(e) > 1 }
回答5:
Here are two more ways of finding a duplicate.
Use a set
require 'set'
def find_a_dup_using_set(arr)
s = Set.new
arr.find { |e| !s.add?(e) }
end
find_a_dup_using_set arr
#=> "hello"
Use select
in place of find
to return an array of all duplicates.
Use Array#difference
class Array
def difference(other)
h = other.each_with_object(Hash.new(0)) { |e,h| h[e] += 1 }
reject { |e| h[e] > 0 && h[e] -= 1 }
end
end
def find_a_dup_using_difference(arr)
arr.difference(arr.uniq).first
end
find_a_dup_using_difference arr
#=> "hello"
Drop .first
to return an array of all duplicates.
Both methods return nil
if there are no duplicates.
I proposed that Array#difference
be added to the Ruby core. More information is in my answer here.
Benchmark
Let's compare suggested methods. First, we need an array for testing:
CAPS = ('AAA'..'ZZZ').to_a.first(10_000)
def test_array(nelements, ndups)
arr = CAPS[0, nelements-ndups]
arr = arr.concat(arr[0,ndups]).shuffle
end
and a method to run the benchmarks for different test arrays:
require 'fruity'
def benchmark(nelements, ndups)
arr = test_array nelements, ndups
puts "\n#{ndups} duplicates\n"
compare(
Naveed: -> {arr.detect{|e| arr.count(e) > 1}},
Sergio: -> {(arr.inject(Hash.new(0)) {|h,e| h[e] += 1; h}.find {|k,v| v > 1} ||
[nil]).first },
Ryan: -> {(arr.group_by{|e| e}.find {|k,v| v.size > 1} ||
[nil]).first},
Chris: -> {arr.detect {|e| arr.rindex(e) != arr.index(e)} },
Cary_set: -> {find_a_dup_using_set(arr)},
Cary_diff: -> {find_a_dup_using_set(arr)}
)
end
I did not include @JjP's answer because only one duplicate is to be returned, and when his/her answer is modified to do that it is the same as @Naveed's earlier answer. Nor did I include @Marin's answer, which, while posted before @Naveed's answer, returned all duplicates rather than just one (a minor point but there's no point evaluating both, as they are identical when return just one duplicate).
I also modified other answers that returned all duplicates to return just the first one found, but that should have essentially no effect on performance, as they computed all duplicates before selecting one.
First suppose the array contains 100 elements:
benchmark(100, 0)
0 duplicates
Running each test 64 times. Test will take about 2 seconds.
Cary_set is similar to Cary_diff
Cary_diff is similar to Ryan
Ryan is similar to Sergio
Sergio is faster than Chris by 4x ± 1.0
Chris is faster than Naveed by 2x ± 1.0
benchmark(100, 1)
1 duplicates
Running each test 128 times. Test will take about 2 seconds.
Cary_set is similar to Cary_diff
Cary_diff is faster than Ryan by 2x ± 1.0
Ryan is similar to Sergio
Sergio is faster than Chris by 2x ± 1.0
Chris is faster than Naveed by 2x ± 1.0
benchmark(100, 10)
10 duplicates
Running each test 1024 times. Test will take about 3 seconds.
Chris is faster than Naveed by 2x ± 1.0
Naveed is faster than Cary_diff by 2x ± 1.0 (results differ: AAC vs AAF)
Cary_diff is similar to Cary_set
Cary_set is faster than Sergio by 3x ± 1.0 (results differ: AAF vs AAC)
Sergio is similar to Ryan
Now consider an array with 10,000 elements:
benchmark(10000, 0)
0 duplicates
Running each test once. Test will take about 4 minutes.
Ryan is similar to Sergio
Sergio is similar to Cary_set
Cary_set is similar to Cary_diff
Cary_diff is faster than Chris by 400x ± 100.0
Chris is faster than Naveed by 3x ± 0.1
benchmark(10000, 1)
1 duplicates
Running each test once. Test will take about 1 second.
Cary_set is similar to Cary_diff
Cary_diff is similar to Sergio
Sergio is similar to Ryan
Ryan is faster than Chris by 2x ± 1.0
Chris is faster than Naveed by 2x ± 1.0
benchmark(10000, 10)
10 duplicates
Running each test once. Test will take about 11 seconds.
Cary_set is similar to Cary_diff
Cary_diff is faster than Sergio by 3x ± 1.0 (results differ: AAE vs AAA)
Sergio is similar to Ryan
Ryan is faster than Chris by 20x ± 10.0
Chris is faster than Naveed by 3x ± 1.0
benchmark(10000, 100)
100 duplicates
Cary_set is similar to Cary_diff
Cary_diff is faster than Sergio by 11x ± 10.0 (results differ: ADG vs ACL)
Sergio is similar to Ryan
Ryan is similar to Chris
Chris is faster than Naveed by 3x ± 1.0
Note that find_a_dup_using_difference(arr)
would be much more efficient if Array#difference
were implemented in C, which would be the case if it were added to the Ruby core.
回答6:
Ruby Array objects have a great method, select
.
select {|item| block } → new_ary
select → an_enumerator
The first form is what interests you here. It allows you to select objects which pass a test.
Ruby Array objects have another method, count
.
count → int
count(obj) → int
count { |item| block } → int
In this case, you are interested in duplicates (objects which appear more than once in the array). The appropriate test is a.count(obj) > 1
.
If a = ["A", "B", "C", "B", "A"]
, then
a.select{|item| a.count(item) > 1}.uniq
=> ["A", "B"]
You state that you only want one object. So pick one.
回答7:
Alas most of the answers are O(n^2)
.
Here is an O(n)
solution,
a = %w{the quick brown fox jumps over the lazy dog}
h = Hash.new(0)
a.find { |each| (h[each] += 1) == 2 } # => 'the"
What is the complexity of this?
- Runs in
O(n)
and breaks on first match
- Uses
O(n)
memory, but only the minimal amount
Now, depending on how frequent duplicates are in your array these runtimes might actually become even better. For example if the array of size O(n)
has been sampled from a population of k << n
different elements only the complexity for both runtime and space becomes O(k)
, however it is more likely that the original poster is validating input and wants to make sure there are no duplicates. In that case both runtime and memory complexity O(n)
since we expect the elements to have no repetitions for the majority of inputs.
回答8:
Something like this will work
arr = ["A", "B", "C", "B", "A"]
arr.inject(Hash.new(0)) { |h,e| h[e] += 1; h }.
select { |k,v| v > 1 }.
collect { |x| x.first }
That is, put all values to a hash where key is the element of array and value is number of occurences. Then select all elements which occur more than once. Easy.
回答9:
I know this thread is about Ruby specifically, but I landed here looking for how to do this within the context of Ruby on Rails with ActiveRecord and thought I would share my solution too.
class ActiveRecordClass < ActiveRecord::Base
#has two columns, a primary key (id) and an email_address (string)
end
ActiveRecordClass.group(:email_address).having("count(*) > 1").count.keys
The above returns an array of all email addresses that are duplicated in this example's database table (which in Rails would be "active_record_classes").
回答10:
find_all() returns an array
containing all elements of enum
for which block
is not false
.
To get duplicate
elements
>> arr = ["A", "B", "C", "B", "A"]
>> arr.find_all { |x| arr.count(x) > 1 }
=> ["A", "B", "B", "A"]
Or duplicate uniq
elements
>> arr.find_all { |x| arr.count(x) > 1 }.uniq
=> ["A", "B"]
回答11:
a = ["A", "B", "C", "B", "A"]
a.each_with_object(Hash.new(0)) {|i,hash| hash[i] += 1}.select{|_, count| count > 1}.keys
This is a O(n)
procedure.
Alternatively you can do either of the following lines. Also O(n) but only one iteration
a.each_with_object(Hash.new(0).merge dup: []){|x,h| h[:dup] << x if (h[x] += 1) == 2}[:dup]
a.inject(Hash.new(0).merge dup: []){|h,x| h[:dup] << x if (h[x] += 1) == 2;h}[:dup]
回答12:
Here is my take on it on a big set of data - such as a legacy dBase table to find duplicate parts
# Assuming ps is an array of 20000 part numbers & we want to find duplicates
# actually had to it recently.
# having a result hash with part number and number of times part is
# duplicated is much more convenient in the real world application
# Takes about 6 seconds to run on my data set
# - not too bad for an export script handling 20000 parts
h = {};
# or for readability
h = {} # result hash
ps.select{ |e|
ct = ps.count(e)
h[e] = ct if ct > 1
}; nil # so that the huge result of select doesn't print in the console
回答13:
r = [1, 2, 3, 5, 1, 2, 3, 1, 2, 1]
r.group_by(&:itself).map { |k, v| v.size > 1 ? [k] + [v.size] : nil }.compact.sort_by(&:last).map(&:first)
回答14:
each_with_object
is your friend!
input = [:bla,:blubb,:bleh,:bla,:bleh,:bla,:blubb,:brrr]
# to get the counts of the elements in the array:
> input.each_with_object({}){|x,h| h[x] ||= 0; h[x] += 1}
=> {:bla=>3, :blubb=>2, :bleh=>2, :brrr=>1}
# to get only the counts of the non-unique elements in the array:
> input.each_with_object({}){|x,h| h[x] ||= 0; h[x] += 1}.reject{|k,v| v < 2}
=> {:bla=>3, :blubb=>2, :bleh=>2}
回答15:
If you are comparing two different arrays (instead of one against itself) a very fast way is to use the intersect operator &
provided by Ruby's Array class.
# Given
a = ['a', 'b', 'c', 'd']
b = ['e', 'f', 'c', 'd']
# Then this...
a & b # => ['c', 'd']
回答16:
a = ["A", "B", "C", "B", "A"]
b = a.select {|e| a.count(e) > 1}.uniq
c = a - b
d = b + c
Results
d
=> ["A", "B", "C"]
回答17:
def firstRepeatedWord(string)
h_data = Hash.new(0)
string.split(" ").each{|x| h_data[x] +=1}
h_data.key(h_data.values.max)
end
回答18:
[1,2,3].uniq!.nil? => true
[1,2,3,3].uniq!.nil? => false
Notice the above is destructive