I've recently started using LINQ quite a bit, and I haven't really seen any mention of run-time complexity for any of the LINQ methods. Obviously, there are many factors at play here, so let's restrict the discussion to the plain IEnumerable
LINQ-to-Objects provider. Further, let's assume that any Func
passed in as a selector / mutator / etc. is a cheap O(1) operation.
It seems obvious that all the single-pass operations (Select
, Where
, Count
, Take/Skip
, Any/All
, etc.) will be O(n), since they only need to walk the sequence once; although even this is subject to laziness.
Things are murkier for the more complex operations; the set-like operators (Union
, Distinct
, Except
, etc.) work using GetHashCode
by default (afaik), so it seems reasonable to assume they're using a hash-table internally, making these operations O(n) as well, in general. What about the versions that use an IEqualityComparer
?
OrderBy
would need a sort, so most likely we're looking at O(n log n). What if it's already sorted? How about if I say OrderBy().ThenBy()
and provide the same key to both?
I could see GroupBy
(and Join
) using either sorting, or hashing. Which is it?
Contains
would be O(n) on a List
, but O(1) on a HashSet
- does LINQ check the underlying container to see if it can speed things up?
And the real question - so far, I've been taking it on faith that the operations are performant. However, can I bank on that? STL containers, for example, clearly specify the complexity of every operation. Are there any similar guarantees on LINQ performance in the .NET library specification?
More question (in response to comments):
Hadn't really thought about overhead, but I didn't expect there to be very much for simple Linq-to-Objects. The CodingHorror post is talking about Linq-to-SQL, where I can understand parsing the query and making SQL would add cost - is there a similar cost for the Objects provider too? If so, is it different if you're using the declarative or functional syntax?
有非常,非常少的保证,但也有一些优化:
使用索引访问,如扩展方法ElementAt
, Skip
, Last
还是LastOrDefault
,将检查的基本类型是否实现IList<T>
让你得到O(1)访问,而不是O(N)。
该Count
方法检查用于ICollection
实施,使该操作是O(1),而不是O(N)。
Distinct
, GroupBy
Join
,我相信也设置汇总的方法( Union
, Intersect
及Except
)使用散列,所以他们应该是接近的,而不是O(N²)O(N)。
Contains
检查的ICollection
实现,因此它可以是O(1)如果底层集合也是O(1),如一个HashSet<T>
但这取决于实际的数据结构,并且不能保证。 哈希套覆盖Contains
方法,这就是为什么他们是O(1)。
OrderBy
的方法使用一个稳定的快速排序,所以他们是O(N日志N)的平均情况。
我想,如果不是所有的内置的扩展方法涵盖了大部分。 真的有极少数的履约担保; LINQ的本身会尽力采取有效的数据结构的优势,但它不是一个免费通行证写潜在低效的代码。
所有你能真正风生水起的是可枚举的方法对于一般情况写得很好,并不会使用天真的算法。 有可能是第三方的东西(博客等)实际描述使用的算法,但这些都不是官方还是在这个意义上,STL算法是保证。
为了说明,这里为反射源代码(ILSpy提供) Enumerable.Count
从System.Core程序:
// System.Linq.Enumerable
public static int Count<TSource>(this IEnumerable<TSource> source)
{
checked
{
if (source == null)
{
throw Error.ArgumentNull("source");
}
ICollection<TSource> collection = source as ICollection<TSource>;
if (collection != null)
{
return collection.Count;
}
ICollection collection2 = source as ICollection;
if (collection2 != null)
{
return collection2.Count;
}
int num = 0;
using (IEnumerator<TSource> enumerator = source.GetEnumerator())
{
while (enumerator.MoveNext())
{
num++;
}
}
return num;
}
}
正如你所看到的,它使用了一些努力,避免简单地列举每个元素天真的解决方案。
我早就知道, .Count()
返回.Count
如果枚举是IList
。
但是我总是有点疲惫有关Set操作的运行时间复杂度: .Intersect()
.Except()
.Union()
下面是反编译BCL(.NET 4.0 / 4.5)实现.Intersect()
评论我的):
private static IEnumerable<TSource> IntersectIterator<TSource>(IEnumerable<TSource> first, IEnumerable<TSource> second, IEqualityComparer<TSource> comparer)
{
Set<TSource> set = new Set<TSource>(comparer);
foreach (TSource source in second) // O(M)
set.Add(source); // O(1)
foreach (TSource source in first) // O(N)
{
if (set.Remove(source)) // O(1)
yield return source;
}
}
结论:
- 性能是O(M + N)
- 实现不趁当收藏品已经是集。 (它可能不是必然直接的,因为所使用的
IEqualityComparer<T>
也需要相匹配。)
为了完整起见,这里都为实现.Union()
和.Except()
剧透:他们也有O(N + M)的复杂性。
private static IEnumerable<TSource> UnionIterator<TSource>(IEnumerable<TSource> first, IEnumerable<TSource> second, IEqualityComparer<TSource> comparer)
{
Set<TSource> set = new Set<TSource>(comparer);
foreach (TSource source in first)
{
if (set.Add(source))
yield return source;
}
foreach (TSource source in second)
{
if (set.Add(source))
yield return source;
}
}
private static IEnumerable<TSource> ExceptIterator<TSource>(IEnumerable<TSource> first, IEnumerable<TSource> second, IEqualityComparer<TSource> comparer)
{
Set<TSource> set = new Set<TSource>(comparer);
foreach (TSource source in second)
set.Add(source);
foreach (TSource source in first)
{
if (set.Add(source))
yield return source;
}
}
我刚刚爆发了反射和他们做检查的基础类型时Contains
被调用。
public static bool Contains<TSource>(this IEnumerable<TSource> source, TSource value)
{
ICollection<TSource> is2 = source as ICollection<TSource>;
if (is2 != null)
{
return is2.Contains(value);
}
return source.Contains<TSource>(value, null);
}
正确的答案是“看情况”。 它依赖于底层的IEnumerable是什么类型。 我知道,对于一些集合(如实现ICollection的或IList的集合)还有所使用的特殊codepaths,但是实际执行不能保证做什么特别的。 比如我知道的ElementAt()具有可转位集合的特例,同样以计数()。 但在一般你应该假设最坏的情况下,O(n)性能。
在generaly我不认为你会发现怎样的表现保证了你想要的,但如果你碰上了LINQ运营商特定的性能问题,你永远可以重新实现它为您的特定集合。 也有许多博客和扩展项目延伸LINQ到对象添加这些类型的性能保证。 检查索引LINQ延伸,并增加了更多的性能优势的运营商设定的。
文章来源: What guarantees are there on the run-time complexity (Big-O) of LINQ methods?