Why is Any slower than Contains?

2020-07-09 03:54发布

I designed the following test:

var arrayLength=5000;
object[] objArray=new object[arrayLength];

for(var x=0;x<arrayLength;x++)
{
    objArray[x]=new object();
}
objArray[4000]=null;
const int TestSize=int.MaxValue;

System.Diagnostics.Stopwatch v= new Stopwatch();
v.Start();
for(var x=0;x<10000;x++)
{
    objArray.Contains(null);
}
v.Stop();
objArray.Contains(null).Dump();
v.Elapsed.ToString().Dump("Contains");

//Any ==
v.Reset();
v.Start();
for(var x=0;x<10000;x++)
{
    objArray.Any(o=>o==null);
}
v.Stop();
objArray.Any(x=>x==null).Dump();
v.Elapsed.ToString().Dump("Any");

//Any Equals
v.Reset();
v.Start();
for(var x=0;x<10000;x++)
{
    objArray.Any(obj=>object.Equals( obj,null));
}
v.Stop();
objArray.Any(obj=>object.Equals( obj,null)).Dump();
v.Elapsed.ToString().Dump("Any");

The results when null is not present:

  • Contains False 00:00:00.0606484
  • Any == False 00:00:00.7532898
  • Any object.Equals False 00:00:00.8431783

When null is present at element 4000:

  • Contains True 00:00:00.0494515
  • Any == True 00:00:00.5929247
  • Any object.Equals True 00:00:00.6700742

When null is present at element 10:

  • Contains True 00:00:00.0038035
  • Any == True 00:00:00.0025687
  • Any True 00:00:00.0033769

So when the object is near the front, Any is slightly faster; when it's at the back, it's much much slower. Why?

4条回答
家丑人穷心不美
2楼-- · 2020-07-09 04:26

I'm guessing it has to do with the fact that Any is an extension method, part of the LINQ library and involves using delegates (via the Func<> syntax). Any time you have to call out to a seperate method (especially as a delegate) it will slow down.

查看更多
何必那么认真
3楼-- · 2020-07-09 04:27

Any is slower because Contains is tailored to the specific container you're using (Array/List/etc), so it doesn't have the overhead of firing up an IEnumerable, calling MoveNext() all the time, etc.

However, using Any will make refactoring easier since if you change that collection, you can stil use it, so really I'd only change it to Contains if you know via a profiler that this is an important piece of code. And if it is, you should probably end up using a smarter data structure anyways like a HashSet, since both Any and Contains are O(n).

查看更多
迷人小祖宗
4楼-- · 2020-07-09 04:28

As other people have already noted, both the Contains and Any methods are extension methods of Enumerable. The big difference in performance has a couple of reasons:

First of all, you supply a delegate to the Any, which has to be called for every object, while the Contains method doesn't have to. Delegate calls are about as fast as a call to an interface method. For this reason, Any is slower.

Next, something that other people seem to have missed, the Contains extension method has a performance optimization for collections implementing ICollection. Because object[] implements ICollection the extension method call results in a method call on the array itself. Internally, this array.Contains method uses a simple for loop to iterate over the array to compare the value. This means array bounds checking is done just once iterating the array.

Because the Any method must call your delegate, a performance optimization as with the Contains method is not possible. This means that the Any method iterates over the collection using the IEnumerable interface, which leads to a interface call + an array bounds check + a delegate call on each and every element. Compare that to the array.Contains, where there are no interface calls, no delegate calls and a single bounds check.

[Update]: One last note. The reason that Any is faster with small collections (and in your case with a null value at the start of the collection) has to do with the cast to ICollection that the Enumerable.Contains does When you do the cast to ICollection yourself, you'll see that the call to Contains is faster than Any:

for(var x=0;x<10000;x++)
{
    ICollection<object> col = objArray;
    col.Contains(null);
}
查看更多
何必那么认真
5楼-- · 2020-07-09 04:30

Any will have to call a delegate for every element it checks (an extra callvirt instruction which is unlikely to get inlined by the JIT). Contains only performs that check. That's why Any is slower. I suspect the fact that Any looks faster than contains when the element is seen very early is that the benchmark can't reflect it easily since they are very close. The setup time for the method call is the majority of the work done in that case (rather than the actual searching operation).

The anonymous method:
--- C:\Users\Mehrdad\AppData\Local\Temporary Projects\ConsoleApplication1\Program.cs 
            Console.WriteLine(s.Any(a => a == 1));
00000000  xor         eax,eax 
00000002  cmp         ecx,1 
00000005  sete        al 
00000008  ret 

Relevant part of Enumerable.Any code:
...
00000051  mov         edx,eax 
00000053  mov         rcx,qword ptr [rbx+8] 
00000057  call        qword ptr [rbx+18h]   // calls the anonymous method above
0000005a  movzx       ecx,al 
0000005d  test        ecx,ecx 
...
查看更多
登录 后发表回答