Performance: .Join vs .Contains - Linq to Entities

2019-04-26 14:06发布

I am using Linq to entities to query the database to get the list of int for further processing. I have two ways to get the list as below:

First is:

List<int> lstBizIds = new List<int>() { 1, 2, 3, 4, 5 };
List<int> lstProjectIds = context.Projects.Where(x => lstBizIds.Contains(x.businessId)).Select(x => x.projectId).ToList();

Second is:

List<int> lstBizIds = new List<int>() { 1, 2, 3, 4, 5 };
List<int> lstProjectIds = context.Projects.Join(lstBizIds, p => p.businessId, u => u, (p, u) => p.projectId).ToList();

Now my question is which one of the methods above is better performance wise? Also does it affect the performance if the first list i.e. lstBizIds grows in size? Suggest me other ways of implementation as well if that are performance reducing.

4条回答
相关推荐>>
2楼-- · 2019-04-26 14:31

Performing a join is quite efficient because Where condition actually performs a cartesian product of all the tables, then filters the rows that satisfy the condition. This means the Where condition is evaluated for each combination of rows (n1 * n2 * n3 * n4)

The Join operator takes the rows from the first tables, then takes only the rows with a matching key from the second table, then only the rows with a matching key from the third table, and so on. Secondly, contains would work in an iterative manner making it slower than join

查看更多
仙女界的扛把子
3楼-- · 2019-04-26 14:33

You should go with Contains, because EF can produce a more efficient query.

This would be the SQL join:

SELECT Id
FROM Projects
INNER JOIN (VALUES (1), (2), (3), (4), (5)) AS Data(Item) ON Projects.UserId = Data.Item

This would be the SQL Contains:

SELECT Id
FROM Projects
WHERE UserId IN (1, 2, 3, 4, 5, 6)

IN is more efficient than JOIN because the DBMS can stop looking after the first match of the IN; the JOIN always finishes, even after the the first match.

You might also want to check which queries are actually sent to the DB. You always have to compare the SQL, not the LINQ code (obviously).

查看更多
叛逆
4楼-- · 2019-04-26 14:44

I just spent quite some time trying to find, what caused a stack overflow error in a program with a few simple LINQ queries accessing a mid size database.

for ICollection with ~10k elements on one side, and sql table on the other, a single change from "join" to "Contains" fixed the stack overflow error.

it seems that notwithstanding comparative performance, Contains is a safer choice.

查看更多
小情绪 Triste *
5楼-- · 2019-04-26 14:54


I choose the first,because it does not increase the computer's memory.
if you use both array to compare the conditions, choose from the second.

查看更多
登录 后发表回答