Came across this code.
var dic = new Dictionary<int, string>();
for(int i=0; i<20000; i++)
{
dic.Add(i, i.ToString());
}
var list = dic.Where(f => f.Value.StartsWith("1")).Select(f => f.Key);//.ToList(); //uncomment for fast results
Console.WriteLine(list.GetType());
var list2 = dic.Where(f => list.Contains(f.Key)).ToList();
Console.WriteLine(list2.Count())
So when .ToList() is commented it's slow, when not - it's fast. Reproducable here How could this be explained? Should I always make everything ToList() to ensure speed (i.e. in which circumstances IEnumerable would be more preferable)? Note I'm talking only about linq to objects, I know linq to sql laziness and stuff.
LINQ uses deferred execution.
Unless you call
.ToList()
, the results of a query are never stored anywhere; instead, it re-iterates the query every time you iterate the results.Normally, this is much faster; there is usually no reason to store all of the results in memory first.
However, your code repeatedly iterates the query; once for each call to the
Where()
callback.You should replace that line with a
Join()
call and noToList()
, which will be faster than either approach.It is caused by deferred execution. IEnumerable does not has to be a static collection. Generaly it is some data source (
dic
in your case) + all methods and expressions (Where, Contains, etc.) that lead to the final set..ToList() executes all these methods and expressions and generates the final result.
So in case you use ToList() it generates a standard .NET List (array of integers) and does all operations on that list.
If you don´t call ToList() (or any other To-method) the IEnumerable can be enumerated several times.
Because when you don't have the
.ToList()
call, thelist2
instantiation will iterate through the wholelist
enumerable for every item in the dictionary. So you go from O(n) to O(n^2) if you use deferred execution.This is because of deferred execution: when you comment out
ToList
, the enumeration is produced by evaluating the sequence of filters for each item in the dictionary. When you do aToList
, however, the sequence is "materialized" in memory, so all the evaluations are performed exactly once.The logic behind the second
Where
withoutToList
looks like this:The logic with
ToList
looks like this:As you can see, the
ToList
transforms anO(n^2)
program with two nested loops of sizen
intoO(2*n)
with two sequential loops of sizen
each.