Why does the Contains() operator degrade Entity Fr

2019-01-03 21:39发布

UPDATE 3: According to this announcement, this has been addressed by the EF team in EF6 alpha 2.

UPDATE 2: I've created a suggestion to fix this problem. To vote for it, go here.

Consider a SQL database with one very simple table.

CREATE TABLE Main (Id INT PRIMARY KEY)

I populate the table with 10,000 records.

WITH Numbers AS
(
  SELECT 1 AS Id
  UNION ALL
  SELECT Id + 1 AS Id FROM Numbers WHERE Id <= 10000
)
INSERT Main (Id)
SELECT Id FROM Numbers
OPTION (MAXRECURSION 0)

I build an EF model for the table and run the following query in LINQPad (I am using "C# Statements" mode so LINQPad doesn't create a dump automatically).

var rows = 
  Main
  .ToArray();

Execution time is ~0.07 seconds. Now I add the Contains operator and re-run the query.

var ids = Main.Select(a => a.Id).ToArray();
var rows = 
  Main
  .Where (a => ids.Contains(a.Id))
  .ToArray();

Execution time for this case is 20.14 seconds (288 times slower)!

At first I suspected that the T-SQL emitted for the query was taking longer to execute, so I tried cutting and pasting it from LINQPad's SQL pane into SQL Server Management Studio.

SET NOCOUNT ON
SET STATISTICS TIME ON
SELECT 
[Extent1].[Id] AS [Id]
FROM [dbo].[Primary] AS [Extent1]
WHERE [Extent1].[Id] IN (1,2,3,4,5,6,7,8,...

And the result was

SQL Server Execution Times:
  CPU time = 0 ms,  elapsed time = 88 ms.

Next I suspected LINQPad was causing the problem, but performance is the same whether I run it in LINQPad or in a console application.

So, it appears that the problem is somewhere within Entity Framework.

Am I doing something wrong here? This is a time-critical part of my code, so is there something I can do to speed up performance?

I am using Entity Framework 4.1 and Sql Server 2008 R2.

UPDATE 1:

In the discussion below there were some questions about whether the delay occurred while EF was building the initial query or while it was parsing the data it received back. To test this I ran the following code,

var ids = Main.Select(a => a.Id).ToArray();
var rows = 
  (ObjectQuery<MainRow>)
  Main
  .Where (a => ids.Contains(a.Id));
var sql = rows.ToTraceString();

which forces EF to generate the query without executing it against the database. The result was that this code required ~20 secords to run, so it appears that almost all of the time is taken in building the initial query.

CompiledQuery to the rescue then? Not so fast ... CompiledQuery requires the parameters passed into the query to be fundamental types (int, string, float, and so on). It won't accept arrays or IEnumerable, so I can't use it for a list of Ids.

8条回答
地球回转人心会变
3楼-- · 2019-01-03 22:19

We were able to solve the EF Contains problem by adding an intermediate table and joining on that table from LINQ query that needed to use Contains clause. We were able to get amazing results with this approach. We have a large EF model and as "Contains" is not allowed when pre-compiling EF queries we were getting very poor performance for queries that use "Contains" clause.

An overview:

  • Create a table in SQL Server - for example HelperForContainsOfIntType with HelperID of Guid data-type and ReferenceID of int data-type columns. Create different tables with ReferenceID of differing data-types as needed.

  • Create an Entity / EntitySet for HelperForContainsOfIntType and other such tables in EF model. Create different Entity / EntitySet for different data-types as needed.

  • Create a helper method in .NET code which takes the input of an IEnumerable<int> and returns an Guid. This method generates a new Guid and inserts the values from IEnumerable<int> into HelperForContainsOfIntType along with the generated Guid. Next, the method returns this newly generated Guid to the caller. For fast inserting into HelperForContainsOfIntType table, create a stored-procedure which takes input of an list of values and does the insertion. See Table-Valued Parameters in SQL Server 2008 (ADO.NET). Create different helpers for different data-types or create a generic helper method to handle different data-types.

  • Create a EF compiled query which is similar to something like below:

    static Func<MyEntities, Guid, IEnumerable<Customer>> _selectCustomers =
        CompiledQuery.Compile(
            (MyEntities db, Guid containsHelperID) =>
                from cust in db.Customers
                join x in db.HelperForContainsOfIntType on cust.CustomerID equals x.ReferenceID where x.HelperID == containsHelperID
                select cust 
        );
    
  • Call the helper method with values to be used in the Contains clause and get the Guid to use in the query. For example:

    var containsHelperID = dbHelper.InsertIntoHelperForContainsOfIntType(new int[] { 1, 2, 3 });
    var result = _selectCustomers(_dbContext, containsHelperID).ToList();
    
查看更多
登录 后发表回答