I am using Linq to entities to query the database to get the list of int for further processing. I have two ways to get the list as below:
First is:
List<int> lstBizIds = new List<int>() { 1, 2, 3, 4, 5 };
List<int> lstProjectIds = context.Projects.Where(x => lstBizIds.Contains(x.businessId)).Select(x => x.projectId).ToList();
Second is:
List<int> lstBizIds = new List<int>() { 1, 2, 3, 4, 5 };
List<int> lstProjectIds = context.Projects.Join(lstBizIds, p => p.businessId, u => u, (p, u) => p.projectId).ToList();
Now my question is which one of the methods above is better performance wise? Also does it affect the performance if the first list i.e. lstBizIds grows in size? Suggest me other ways of implementation as well if that are performance reducing.
Performing a join is quite efficient because Where condition actually performs a cartesian product of all the tables, then filters the rows that satisfy the condition. This means the Where condition is evaluated for each combination of rows (n1 * n2 * n3 * n4)
The Join operator takes the rows from the first tables, then takes only the rows with a matching key from the second table, then only the rows with a matching key from the third table, and so on. Secondly, contains would work in an iterative manner making it slower than join
You should go with
Contains
, because EF can produce a more efficient query.This would be the SQL join:
This would be the SQL Contains:
IN
is more efficient thanJOIN
because the DBMS can stop looking after the first match of theIN
; theJOIN
always finishes, even after the the first match.You might also want to check which queries are actually sent to the DB. You always have to compare the SQL, not the LINQ code (obviously).
I just spent quite some time trying to find, what caused a stack overflow error in a program with a few simple LINQ queries accessing a mid size database.
for ICollection with ~10k elements on one side, and sql table on the other, a single change from "join" to "Contains" fixed the stack overflow error.
it seems that notwithstanding comparative performance, Contains is a safer choice.
I choose the first,because it does not increase the computer's memory.
if you use both array to compare the conditions, choose from the second.