UPDATE 3: According to this announcement, this has been addressed by the EF team in EF6 alpha 2.
UPDATE 2: I've created a suggestion to fix this problem. To vote for it, go here.
Consider a SQL database with one very simple table.
CREATE TABLE Main (Id INT PRIMARY KEY)
I populate the table with 10,000 records.
WITH Numbers AS
(
SELECT 1 AS Id
UNION ALL
SELECT Id + 1 AS Id FROM Numbers WHERE Id <= 10000
)
INSERT Main (Id)
SELECT Id FROM Numbers
OPTION (MAXRECURSION 0)
I build an EF model for the table and run the following query in LINQPad (I am using "C# Statements" mode so LINQPad doesn't create a dump automatically).
var rows =
Main
.ToArray();
Execution time is ~0.07 seconds. Now I add the Contains operator and re-run the query.
var ids = Main.Select(a => a.Id).ToArray();
var rows =
Main
.Where (a => ids.Contains(a.Id))
.ToArray();
Execution time for this case is 20.14 seconds (288 times slower)!
At first I suspected that the T-SQL emitted for the query was taking longer to execute, so I tried cutting and pasting it from LINQPad's SQL pane into SQL Server Management Studio.
SET NOCOUNT ON
SET STATISTICS TIME ON
SELECT
[Extent1].[Id] AS [Id]
FROM [dbo].[Primary] AS [Extent1]
WHERE [Extent1].[Id] IN (1,2,3,4,5,6,7,8,...
And the result was
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 88 ms.
Next I suspected LINQPad was causing the problem, but performance is the same whether I run it in LINQPad or in a console application.
So, it appears that the problem is somewhere within Entity Framework.
Am I doing something wrong here? This is a time-critical part of my code, so is there something I can do to speed up performance?
I am using Entity Framework 4.1 and Sql Server 2008 R2.
UPDATE 1:
In the discussion below there were some questions about whether the delay occurred while EF was building the initial query or while it was parsing the data it received back. To test this I ran the following code,
var ids = Main.Select(a => a.Id).ToArray();
var rows =
(ObjectQuery<MainRow>)
Main
.Where (a => ids.Contains(a.Id));
var sql = rows.ToTraceString();
which forces EF to generate the query without executing it against the database. The result was that this code required ~20 secords to run, so it appears that almost all of the time is taken in building the initial query.
CompiledQuery to the rescue then? Not so fast ... CompiledQuery requires the parameters passed into the query to be fundamental types (int, string, float, and so on). It won't accept arrays or IEnumerable, so I can't use it for a list of Ids.
It was fixed on Entity Framework 6 Alpha 2: http://entityframework.codeplex.com/SourceControl/changeset/a7b70f69e551
http://blogs.msdn.com/b/adonet/archive/2012/12/10/ef6-alpha-2-available-on-nuget.aspx
We were able to solve the EF Contains problem by adding an intermediate table and joining on that table from LINQ query that needed to use Contains clause. We were able to get amazing results with this approach. We have a large EF model and as "Contains" is not allowed when pre-compiling EF queries we were getting very poor performance for queries that use "Contains" clause.
An overview:
Create a table in SQL Server - for example
HelperForContainsOfIntType
withHelperID
ofGuid
data-type andReferenceID
ofint
data-type columns. Create different tables with ReferenceID of differing data-types as needed.Create an Entity / EntitySet for
HelperForContainsOfIntType
and other such tables in EF model. Create different Entity / EntitySet for different data-types as needed.Create a helper method in .NET code which takes the input of an
IEnumerable<int>
and returns anGuid
. This method generates a newGuid
and inserts the values fromIEnumerable<int>
intoHelperForContainsOfIntType
along with the generatedGuid
. Next, the method returns this newly generatedGuid
to the caller. For fast inserting intoHelperForContainsOfIntType
table, create a stored-procedure which takes input of an list of values and does the insertion. See Table-Valued Parameters in SQL Server 2008 (ADO.NET). Create different helpers for different data-types or create a generic helper method to handle different data-types.Create a EF compiled query which is similar to something like below:
Call the helper method with values to be used in the
Contains
clause and get theGuid
to use in the query. For example: