In my database I have the following tables:
Many-Many relationships exist between Person-InterestTag and Post-InterestTag
I need to perform a linq query in EF 4.1 to pull back any post that contains at least one interest tag that matches at least one interest tag related to the given user.
Example
A person has the following interests:
I need to return any Post that is related to either cars, sports, or fitness.
What is the most efficient way to write this query in terms of performance?
Edit
Running into an error based on the answer given below...
This compiles fine but throws an error at runtime:
var matchingPosts = posts.Where(post => post.Topics.Any(postTopic => person.Interests.Contains(postTopic)));
The error is:
Unable to create a constant value of type 'System.Collections.Generic.ICollection`1'. Only primitive types ('such as Int32, String, and Guid') are supported in this context.
Any ideas how to fix this?
EDIT 2
So my classes are structured as such:
public class Person
{
public int PersonID {get; set;}
public string FirstName {get; set;}
public string LastName {get; set;}
//other properties of types string, int, DateTime, etc.
public ICollection<InterestTag> InterestTags {get; set;}
}
public class Post
{
public int PostID {get; set;}
public string Title{get; set;}
public string Content {get; set;}
//other properties of types string, int, DateTime, etc.
public ICollection<InterestTag> InterestTags {get; set;}
}
public class InterestTag
{
public int InterestTagID { get; set; }
public string InterestDescription { get; set; }
public bool Active { get; set; }
public ICollection<Person> Persons { get; set; }
public ICollection<Post> Posts { get; set; }
}
In my Context class I am overriding OnModelCreating to define my DB table names
modelBuilder.Entity<Person>().HasMany(u => u.InterestTags).WithMany(t => t.Persons)
.Map(m =>
{
m.MapLeftKey("PersonID");
m.MapRightKey("InterestTagID");
m.ToTable("PersonInterestTags");
});
modelBuilder.Entity<Post>().HasMany(u => u.InterestTags).WithMany(t => t.Posts)
.Map(m =>
{
m.MapLeftKey("PostID");
m.MapRightKey("InterestTagID");
m.ToTable("PostInterestTags");
});
In my query method I am bring back an IQueryable of Post and applying some filters, including the clause in which I am trying to accomplish in this question.
var person = personRepository.Get(x => x.PersonID = 5);
var posts = postRepository.GetQueryable();
//I have tried this and get the error above
posts= posts.Where(x => x.InterestTags.Any(tag => person.InterestTags.Contains(tag)));
If you start with just a given personId
(or userId
) you can do this query in one roundtrip like so:
var posts = context.Posts
.Intersect(context.People
.Where(p => p.Id == givenPersonId)
.SelectMany(p => p.InterestTags.SelectMany(t => t.Posts)))
.ToList();
This translates into an INTERSECT
statement in SQL.
You can also do this in two roundtrips:
var interestTagsOfPerson = context.People.Where(p => p.Id == givenPersonId)
.Select(p => p.InterestTags.Select(t => t.Id))
.SingleOrDefault();
// Result is an IEnumerable<int> which contains the Id of the tags of this person
var posts = context.Posts
.Where(p => p.InterestTags.Any(t => interestTagsOfPerson.Contains(t.Id)))
.ToList();
// Contains translates into an IN clause in SQL
Using a list of primitive types in the second query (interestTagsOfPerson
is a collection of int
) also fixes the error you mentioned in your Edit in the question. For Contains
you cannot use object references in LINQ to Entities because EF doesn't know how to translate that into SQL.
I have no clue which of the two approaches is faster (SQL experts might have a better idea) but would probably start to test the first option. (I've tested a little bit and it seemed to return the correct results, but it's the first time that I have used Intersect
.)
Edit
To give an idea of the generated SQL (captured from SQL Profiler):
The first query (with Intersect
) creates this SQL query:
SELECT
[Intersect1].[Id] AS [C1],
[Intersect1].[Name] AS [C2],
FROM (SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Name] AS [Name],
FROM [dbo].[Posts] AS [Extent1]
INTERSECT
SELECT
[Join1].[Id] AS [Id],
[Join1].[Name] AS [Name],
FROM [dbo].[PersonInterestTags] AS [Extent2]
INNER JOIN (SELECT [Extent3].[TagId] AS [TagId],
[Extent4].[Id] AS [Id],
[Extent4].[Name] AS [Name]
FROM [dbo].[PostInterestTags] AS [Extent3]
INNER JOIN [dbo].[Posts] AS [Extent4]
ON [Extent3].[PostId] = [Extent4].[Id] ) AS [Join1]
ON [Extent2].[TagId] = [Join1].[TagId]
WHERE 1 = [Extent2].[PersonId]) AS [Intersect1]
The second option:
Query1 (for the list of the person's tag Ids):
SELECT
[Project1].[Id] AS [Id],
[Project1].[C1] AS [C1],
[Project1].[TagId] AS [TagId]
FROM ( SELECT
[Limit1].[Id] AS [Id],
[Extent2].[TagId] AS [TagId],
CASE WHEN ([Extent2].[PersonId] IS NULL)
THEN CAST(NULL AS int)
ELSE 1
END AS [C1]
FROM (SELECT TOP (2) [Extent1].[Id] AS [Id]
FROM [dbo].[People] AS [Extent1]
WHERE 1 = [Extent1].[Id] ) AS [Limit1]
LEFT OUTER JOIN [dbo].[PersonInterestTags] AS [Extent2]
ON [Limit1].[Id] = [Extent2].[PersonId]
) AS [Project1]
ORDER BY [Project1].[Id] ASC, [Project1].[C1] ASC
Query 2 for the final posts:
SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Name] AS [Name],
FROM [dbo].[Posts] AS [Extent1]
WHERE EXISTS (SELECT
1 AS [C1]
FROM [dbo].[PostInterestTags] AS [Extent2]
WHERE ([Extent1].[Id] = [Extent2].[PostId]) AND ([Extent2].[TagId] IN (1,2,3))
)
In this example the query 1 returned (1,2,3), hence the (1,2,3) in the IN
clause in query 2.
Edit: I had to edit my post because I forgot about the many-many-relationship between post and topic. Now it should work.
I cannot tell you if this is the most efficient way, but it would be a way using LINQ queries so it should be highly efficient:
var matchingPosts = posts.Where(post => post.Topics.Any(postTopic => person.Interests.Contains(postTopic)));
If you want to use parallel execution you could modify it like that:
var matchingPosts = posts.AsParallel().Where(post => post.Topics.Any(postTopic => person.Interests.Contains(postTopic)));
As you are using EF you would need this query:
var matchingPosts = from post in posts
where post.Topics.Any(topic => person.Interests.Contains(topic))
select post;
how about this:
context.Persons
.Where(p => p.Name == "x")
.SelectMany(p => p.Interests.SelectMany(i => i.Posts))
.Distinct()
.Take(10)
.ToList();
Take() is there for performance reasons and pagination. You should never select all records because first of all no user will read a list of thousands records and second the result set could grow in the future and the query will not scale.