I have an Hive table made of user_id and item_id (id of items that have been purchased by the user). I want to get a list of all the users who purchased item 1 but not item 2 and 3.
To do this I wrote the simple query:
SELECT user_id, collect_set(item_id) itemslist FROM mytable
WHERE item_id in (1, 2)
GROUP BY user_id
HAVING -- what should I put here???
As you can see, I don't know how to check whether the array itemslist contains 1 and not 2.
How do you do this? If there is some more efficient way can you please tell me both (or more) methods?