Lets say I have a1
and a2
:
a1 = [1,2,3]
a2 = [4,2,5]
To see if a1
shares any elements with a2
, I can loop over each and compare each element:
def intersect?(x,y)
a1.each do |x|
a2.each do |y|
if x == y return true
end
end
false
end
But even easier, (a1.to_set & a2.to_set).present?
gives me the same answer.
I'm assuming that the set operation is quicker and more efficient? If this is true, is it still true taking into account to overhead (if any) of the .to_set
operation on each array?
tia
It should be faster for large arrays. Your method runs in O(m*n) time because it has to loop over both arrays. For tables of 3 elements each this is basically negligible, but for larger tables it can be very expensive.
The second method will use hash lookups which are much faster, but first the arrays have to be put in sets.
What you should do is try both methods using arrays of sizes you expect to see in your application and see which is faster. If they're about the same size you can just pick whichever one you think is clearer.
Surprisingly the
&
method of Array is faster than that of Set for quite large collections:Result:
steenslag's answer had an interesting observation that
array & array
was faster thatset & set
. It looks like most of that penalty appears to be the expense of obtaining keys from the underlying hash of the first set to enumerate on. A hybrid approach using the array for the left side of the operation and set for the right hand is faster yet. If you only want to know if there's any intersection, the same approach with#any?
is even quicker:I just want to elaborate upon the excellent answers by steenslag and dbenhur. Specifically, I wanted to know if
SortedSet
would perform any better. It actually surprised me initially that the RubySet
type was not implemented as a sorted set, since I come from C++; the STL by default uses an ordered set, and you generally have to specifyunordered_set
if you don't want ordering.I also wanted to know if the size of the set made a difference, as suggested in some other answers.
Here are the results for
f=20; n=20000
:And here are the results for
f=10000; n=10
:So, for large sets, looks like
Set
does better thanSortedSet
; and for smaller sets,SortedSet
does better thanSet
. When using the&
notation,Array
is faster than either. Looks likeSortedSet2present
performs significantly more efficiently with large sets, whereasSet2present
performs more efficiently with small sets.Whereas
Set
is implemented usingHash
,SortedSet
is anRBTree
(implemented in C). In both cases,&
is implemented in Ruby rather than C.The truth is with arrays this small, they either will be essentially the same or the lists will be faster than the sets.
A decent set implementation will do set operations faster than you can do list operations BUT there's some overhead. If you want to know what your implementation will do, use big sets/lists and test.