I have the following part of a query (It's the end of a larger query - queryBuilder is an IQueryable)
var results = queryBuilder
.OrderBy(x => x.Vehicle.RangeId)
.ThenBy(x => x.Rate.Rental)
.GroupBy(x => x.Vehicle.RangeId)
.Select(x => x.FirstOrDefault())
.OrderBy(x => x.Rate.Rental);
It's working with an object with a Vehicle object and Rental object. There are around 12K vehicles and split into about 40 ranges (RangeId - Indexed int dictates that).
The query above works fine however is does not do what I expect. I expect for it to order the vehicles by RangeId and then by rental (Lowest rental first).
Then group them by the RangeId and select the first one from the group, which 'should' be the cheapest rental as it's ordered to be like before the groupby is called.
However. It's not. It just randomly gets one in no order. Sometimes its the second cheapest. Sometimes 5th and so on. For some reason the GroupBy is not honouring the original order!
I can get this working, by doing the following. However the performance is absolutely dire and takes ages to complete.
var results = queryBuilder
.OrderBy(x => x.Vehicle.RangeId)
.ThenBy(x => x.Rate.Rental)
.GroupBy(x => x.Vehicle.RangeId)
.Select(x => x.OrderBy(o => o.Rate.Rental).FirstOrDefault())
.OrderBy(x => x.Rate.Rental);
Any help or pointers would be appreciated. I can't firgure out why the original query is not keeping everything in order.
I expect for it to order the vehicles by RangeId and then by rental
In an LINQ to Entities query, any ordering before a GroupBy
is simply ignored. You won't even see it in the executed SQL. That is because Entity Framework takes the grouping expression to order by (in your case x => x.Vehicle.RangeId
). Why is that?
LINQ's GroupBy
is seemingly similar to SQL's GROUP BY
, but actually it's quite different.
GROUP BY
in SQL is "destructive", by which I mean that any information other than the columns in the GROUP BY
is lost (apart from aggregate expressions). If you do ...
SELECT Brand, COUNT(*)
FROM Cars
GROUP BY Brand
... you only see Brand
and their counts. You don't see the cars in the groups.
That's exactly what LINQ's GroupBy
does: it produces groups of complete objects. All information in the original data is still there. You'll see cars grouped by their brands.
That means that ORMs that translate GroupBy
as GROUP BY
give themselves a hard time building the result set. LINQ to SQL does that. It executes a GROUP BY
query first and then it needs separate queries (one per group actually) to make up for the "lost" data.
EF implements GroupBy
differently. It gets all data in one query and then it builds the groups in memory. You won't see GROUP BY
in the generated SQL. You see an ORDER BY
instead. I think EF prefers a sorted SQL query result for more efficient processing in memory. (And I can imagine combines better with other LINQ statements in the pipeline).
So that's why any ordering before GroupBy
is ignored. And why you can only apply ordering after the grouping.
the performance is absolutely dire
It's hard to tell from here why that is. Maybe you can do the ordering in memory:
var results = queryBuilder
.GroupBy(x => x.Vehicle.RangeId)
.Select(x => x.OrderBy(o => o.Rate.Rental).FirstOrDefault())
.Select(o => new { o.Rate.Rental, o }
.AsEnumerable()
.OrderBy(x => x.Rental);
But it may also be an indexing issue. If there's no proper index on Rate.Rental
, ordering by that column is expensive.