Today I discovered that Entity Framework was adding an unnecessary sub query to the SQL it generates. I started digging my code trying to narrow down where it might come from. A (long) while later I pin-pointed what's causing it. But now I'm more confused than when I started, as I have no clue why it causes it.
Basically what I discovered is that on certain scenarios, simply converting a constant into a variable can alter the SQL that Entity Framework generates. I've shrunk everything to the bare minimum and packed it in a little console app:
using System;
using System.Data.Entity;
using System.Linq;
class Program
{
private static readonly BlogContext _db = new BlogContext();
static void Main(string[] args)
{
const string email = "foo@bar.com";
var comments = from c in _db.Comments
where c.Email == email
select c;
var result = (from p in _db.Posts
join c in comments on p.PostId equals c.PostId
orderby p.Title
select new { p.Title, c.Content });
Console.WriteLine(result);
}
}
public class BlogContext : DbContext
{
public DbSet<Post> Posts { get; set; }
public DbSet<Comment> Comments { get; set; }
}
public class Post
{
public int PostId { get; set; }
public string Title { get; set; }
}
public class Comment
{
public int CommentId { get; set; }
public int PostId { get; set; }
public string Email { get; set; }
public string Content { get; set; }
}
This shows the following output, which is perfect:
SELECT
[Extent1].[PostId] AS [PostId],
[Extent1].[Title] AS [Title],
[Extent2].[Content] AS [Content]
FROM [dbo].[Posts] AS [Extent1]
INNER JOIN [dbo].[Comments] AS [Extent2] ON [Extent1].[PostId] = [Extent2].[PostId]
WHERE N'foo@bar.com' = [Extent2].[Email]
ORDER BY [Extent1].[Title] ASC
Now if I make email
a variable:
/*const*/ string email = "foo@bar.com";
The output changes radically:
SELECT
[Project1].[PostId] AS [PostId],
[Project1].[Title] AS [Title],
[Project1].[Content] AS [Content]
FROM ( SELECT
[Extent1].[PostId] AS [PostId],
[Extent1].[Title] AS [Title],
[Extent2].[Content] AS [Content]
FROM [dbo].[Posts] AS [Extent1]
INNER JOIN [dbo].[Comments] AS [Extent2] ON [Extent1].[PostId] = [Extent2].[PostId]
WHERE [Extent2].[Email] = @p__linq__0
) AS [Project1]
ORDER BY [Project1].[Title] ASC
As a side note, LINQ to SQL does not seem to do this. I know it's probably okay to ignore this, as both commands return the same data. But I'm extremely curious as to why this happens. Up until today I always had the (perhaps false?) impression that it is always safe to turn a constant into a variable, providing that the value remains the same (which in this case does). So I have to ask...
Why a seemingly insignificant change causes such a big difference in the generated SQL?
Update:
Just to be clear, my question isn't about the value of email
being a hard-coded value in the first query and a variable in the second (which makes all the sense in the world). My question is about why the variable version results in the extra sub query.
Thanks!