I have a LINQ query which is attempting to get all of the distinct months of all of the dates in a table.
I had this working using the Distinct() extension method. I then made it more readable by using an extension method to extract the month. And then it stopped returning Distinct results.
Can anyone help me work out what happened here?
As an aside, if someone can tell me the best way to get the distinct months, that would be nice too. But it's more important that I understand why this is failing.
Here's the code.
static class DcUtils
{
public static DateTime GetMonth(this Timesheet_Entry entry)
{
DateTime dt = new DateTime(
entry.Entry_Start_DateTime.Year,
entry.Entry_Start_DateTime.Month,
1
);
return dt;
}
}
public class Demo
{
public DemonstrateBug()
{
TimesheetDataClassesDataContext dc = new TimesheetDataClassesDataContext();
/////////////////////////////////
//// Here are the queries and their behaviours
var q1 = (
from ts
in dc.Timesheet_Entries
select new DateTime(ts.Entry_Start_DateTime.Year, ts.Entry_Start_DateTime.Month, 1)
).Distinct();
// This returns 3 (which is what I want)
int lengthQuery1 = q1.Count();
// And now for the bug!
var q2 = (
from ts
in dc.Timesheet_Entries
select ts.GetMonth()
).Distinct();
// This returns 236 (WTF?)
int lengthQuery2 = q2.Count();
}
}
It occurred to me to run this through the SQL Server Profiler.
This query:
generates the following SQL. As you can see, it converted the
System.DateTime
calls into Transact SQL.But if I put the month extraction logic in the extension method:
It generats the following SQL.
So it's moved the DISTINCT function to the server, but kept the date extraction code until after the DISTINCT operation, which is not what I want, and is not what happens in the first example.
I don't know if I should call this a bug or a leaky abstraction.
LINQ to SQL is smart enough to convert the
new DateTime()
expression from your initial lambda expression into a SQL statements that can be executed at the server. If you replace this expression with an (extension) method, LINQ to SQL will only see a call to an opaque method it knows nothing about, hence it cannot generate any SQL for the method call and the part of the SQL query messing with the dates disappears.But this shouldn't break anything - what cannot be transformed into SQL must be executed at the client. So what happens? The date you want to perform the distinct operation on cannot be calculated at the server because of the opaque method call, hence the distinct operation cannot be performed at the server, too. But the query you recorded from the broken version contains a
DISTINCT
statement.I don't use the LINQ query syntax, but I assume you have written something you don't actually mean or the compiler or LINQ to SQL inferred something you didn't mean.
versus
So I guess you got the second one for what ever reason - the distinct operation seems to get propagated over the select. Maybe it's the combination of
Distinct()
with the query syntax and the contained opaque method call. Try both versions without the query syntax and see what you get back and what queries are send to the server. You can also try to insertToList()
calls to force the transition from LINQ to SQL to LINQ to Objects - this might help to cast some light onto the situation, too.