Let's say we have a sorted collection such as SortedSet or SortedList with many (10M+) elements. Lots of querying is happening, so performance matters. From runtime comparisons, I'm under the impression that LINQ to Objects doesn't take advantage of the sorting, therefore not taking advantage of potential performance gains.
First example - counting the elements in a range:
var mySortedSet1 = new SortedSet<int>();
// populate ...
int rangeCount = (from n in mySortedSet1
where ((n >= 1000000000) && (n <= 2000000000))
select n).Count();
Not exactly sure what LINQ to Objects does here internally, worst case it's checking every single element which would be O(n). The can be done a lot faster by taking advantage of the sorting with a binary search for the lower and upper bound in O(log n).
Second example - SelectMany over list of sets:
var myListOfSortedSets = new List<SortedSet<int>>();
// populate...
var q = myListOfSortedSets.SelectMany(s => s).OrderBy(s => s);
foreach (var n in q)
{
Console.WriteLine(n);
}
If LINQ to SQL Objects were to take advantage of the sorting, it could effectively zipper-merge all the sorted sets into one large sorted list in O(n). The .OrderBy on the result could then be ignored as the list is already sorted.
Instead, SelectMany concatenates all the sorted sets into one large (now unsorted) list which will required another O(n log n) sort. This can easily be verified by removing the .OrderBy and observing the order in which the elements are written to the console.
My question is: is there already an alternative, more efficient implementation of LINQ to SortedSet/SortedList out there?
i4o looks very interesting, but it seems to require secondary index collections to improve query performance on the original collection. I just want queries on my sorted collections to run faster by taking advantage of the sorting.