How can one select only the items he want in the IN list? for example
select * from pagetags where TagID in (1,2,4)
Now I want all the pages which has all the above 3 IDs assigned to them (1,2,4), not just any of them but all of them?
Is there a way? any other operator? I have already tried = Any
and = All
but no luck.
The term for this type of problem is relational division. One way below.
SELECT PageID
FROM pagetags
WHERE TagID IN ( 1, 2, 4 )
GROUP BY PageID
HAVING Count(DISTINCT TagID) = 3
you could try something like this:
SELECT id, Tag FROM (
SELECT id, Tag, COUNT(*) OVER(partition by id) as cnt
FROM pagetags
WHERE Tag in(1,2,4)
GROUP BY id, tag
) a WHERE a.cnt = 3
The selected answer from Martin Smith
SELECT PageID
FROM pagetags
WHERE TagID IN ( 1, 2, 4 )
GROUP BY PageID
HAVING Count(DISTINCT TagID) = 3
is correct but if speed is a problem then try these.
I have a large table doing the same thing and got 10x better performance with the following.
0.2 seconds versus 2.0 seconds for query returning 272 from a table with 3 million rows.
Also tested on a bigger table with 5 tags and same 10x but now 0.5 versus 5.0.
Index is PageID, TagID with millions of PageID and hundreds of TagID.
Common scenario where many objects are tagged multivalue property.
SELECT distinct(p1.PageID)
FROM pagetags p1
JOIN pagetags p2
ON p2.PageID = p1.PageID
AND p2.TagID = 2
JOIN pagetags p3
ON p3.PageID = p1.PageID
AND p3.TagID = 4
WHERE p1.PageID = 1
ORDER BY p1.PageID
or
SELECT distinct(PageID)
FROM pagetags
WHERE TagID = 1
INTERSECT
SELECT distinct(PageID)
FROM pagetags
WHERE TagID = 2
INTERSECT
SELECT distinct(PageID)
FROM pagetags
WHERE TagID = 4
ORDER BY PageID
Prefer the last as with over 5 joins the query optimizer often will make some bad decisions.
And with this have not used up the Group By if you need it for another aggregation.
SELECT distinct(PageID)
FROM pagetags
WHERE TagID IN (1,2,4)
and PageID in
(select distinct(PageID) from pagetags group by PageID having count(TagID)=3)
group by PageID