Rails: How to Select Records Which Don't Have

2019-04-15 16:36发布

问题:

Let's assume we have users:

class User < ActiveRecord::Base
  has_many :connections
  has_many :groups, through: :connections
end

And groups:

class Group < ActiveRecord::Base
  has_many :connections
  has_many :users, through: :connections
end

Basically, standard many-to-many connection:

class Connection
  belongs_to :user
  belongs_to :group
end

What I intend to do is:

  • Select only users who don't belong to given set of Groups (groups with ids [4,5,6])
  • Select only users who belong to one set of Groups ([1,2,3]) and don't belong to another ([4,5,6])
  • Select only users who don't belong to a Group

Also, I don't want to:

  • Fetch a lot of data from database to manipulate it with Ruby code. I know that will be inefficient in terms of CPU and memory (Ruby is much slower than any commonly used DB engine, and typically I want to rely on DB engine to do the heavy lifting)
  • I tried queries like User.joins(:group).where(group_id: [1,2,3]).where.not(group_id: [4,5,6]) and they return wrong results (some users from the result set belong to groups 4,5,6 as well as 1,2,3)
  • I don't want to do join merely for the sake of only checking for existence, because I know that that is a pretty complex (i.e. CPU/memory-intensive) operation for DB

回答1:

Such questions are pretty common amongst beginner to middle-level Rails developers. You know ActiveRecord interface and basic SQL operations, but you are stumbled on such kind of tasks as outlined in the question. (Couple of examples of such questions: 1, 2).

The answer is simple: use SQL EXISTS condition. Quick reference from the given URL:

Syntax

The syntax for the SQL EXISTS condition is:

WHERE EXISTS ( subquery );

Parameters or Arguments

subquery

The subquery is a SELECT statement. If the subquery returns at least one record in its result set, the EXISTS clause will evaluate to true and the EXISTS condition will be met. If the subquery does not return any records, the EXISTS clause will evaluate to false and the EXISTS condition will not be met.

It is also mentioned that EXISTS might be slower than JOIN, but that is usually not true. From the Exists v. Join question on SO:

EXISTS is only used to test if a subquery returns results, and short circuits as soon as it does. JOIN is used to extend a result set by combining it with additional fields from another table to which there is a relation. [...] If you have proper indexes, most of the time the EXISTS will perform identically to the JOIN. The exception is on very complicated subqueries, where it is normally quicker to use EXISTS.

So, the database doesn't need to look through all the connections (it stops 'joining' records with 'exists' as soon as it founds the right one), and doesn't need to return all the fields from the table joined (just check that the corresponding row, well, does exist).

Answering the specific questions:

Select only such users, who don't belong to given set of Groups (groups with ids [4,5,6])

not_four_to_six = User.where("NOT EXISTS (
   SELECT 1 FROM connections
   WHERE connections.user_id = users.id
   AND connections.group_id IN (?)
  )", [4,5,6])

Select only such users, who belong to one set of Groups ([1,2,3]) and don't belong to another ([4,5,6])

one_two_three = not_four_to_six.where("EXISTS (
   SELECT 1 FROM connections
   WHERE connections.user_id = users.id
   AND connections.group_id IN (?)
  )", [1,2,3])

Select only such users, who doesn't belong to a Group

User.where("NOT EXISTS (
   SELECT 1 FROM connections
   WHERE connections.user_id = users.id
  )")