Let's assume we have users:
class User < ActiveRecord::Base
has_many :connections
has_many :groups, through: :connections
end
And groups:
class Group < ActiveRecord::Base
has_many :connections
has_many :users, through: :connections
end
Basically, standard many-to-many connection:
class Connection
belongs_to :user
belongs_to :group
end
What I intend to do is:
- Select only users who don't belong to given set of Groups (groups with ids
[4,5,6]
)
- Select only users who belong to one set of Groups (
[1,2,3]
) and don't belong to another ([4,5,6]
)
- Select only users who don't belong to a Group
Also, I don't want to:
- Fetch a lot of data from database to manipulate it with Ruby code. I know that will be inefficient in terms of CPU and memory (Ruby is much slower than any commonly used DB engine, and typically I want to rely on DB engine to do the heavy lifting)
- I tried queries like
User.joins(:group).where(group_id: [1,2,3]).where.not(group_id: [4,5,6])
and they return wrong results (some users from the result set belong to groups 4,5,6 as well as 1,2,3)
- I don't want to do
join
merely for the sake of only checking for existence, because I know that that is a pretty complex (i.e. CPU/memory-intensive) operation for DB
Such questions are pretty common amongst beginner to middle-level Rails developers. You know ActiveRecord
interface and basic SQL
operations, but you are stumbled on such kind of tasks as outlined in the question. (Couple of examples of such questions: 1, 2).
The answer is simple: use SQL EXISTS
condition. Quick reference from the given URL:
Syntax
The syntax for the SQL EXISTS condition is:
WHERE EXISTS ( subquery );
Parameters or Arguments
subquery
The subquery is a SELECT
statement. If the subquery returns at least one record in its result set, the EXISTS
clause will evaluate to true and the EXISTS
condition will be met. If the subquery does not return any records, the EXISTS
clause will evaluate to false and the EXISTS
condition will not be met.
It is also mentioned that EXISTS
might be slower than JOIN
, but that is usually not true. From the Exists v. Join question on SO:
EXISTS
is only used to test if a subquery returns results, and short circuits as soon as it does. JOIN
is used to extend a result set by combining it with additional fields from another table to which there is a relation. [...] If you have proper indexes, most of the time the EXISTS
will perform identically to the JOIN
. The exception is on very complicated subqueries, where it is normally quicker to use EXISTS
.
So, the database doesn't need to look through all the connections (it stops 'joining' records with 'exists' as soon as it founds the right one), and doesn't need to return all the fields from the table joined (just check that the corresponding row, well, does exist).
Answering the specific questions:
Select only such users, who don't belong to given set of Groups (groups with ids [4,5,6]
)
not_four_to_six = User.where("NOT EXISTS (
SELECT 1 FROM connections
WHERE connections.user_id = users.id
AND connections.group_id IN (?)
)", [4,5,6])
Select only such users, who belong to one set of Groups ([1,2,3]
) and don't belong to another ([4,5,6]
)
one_two_three = not_four_to_six.where("EXISTS (
SELECT 1 FROM connections
WHERE connections.user_id = users.id
AND connections.group_id IN (?)
)", [1,2,3])
Select only such users, who doesn't belong to a Group
User.where("NOT EXISTS (
SELECT 1 FROM connections
WHERE connections.user_id = users.id
)")