Why is ActiveRecord destroy_all taking so long?

2019-08-03 18:12发布

问题:

I have a simple rails app with articles and comments running on MySQL 5.5, Ruby 1.9.3 and rails 3.2.12:

class Article < ActiveRecord::Base                                                                                   
  attr_accessible :body, :title
  has_many :comments
end   

class Comment < ActiveRecord::Base
  attr_accessible :content
  belongs_to :article
end

I have generated lots of comments for an article, and am now trying to delete them all in the rails console:

$ rails c 
Loading development environment (Rails 3.2.12)
[1] pry(main)> a = Article.find(1)
   (2.0ms)  SET SQL_AUTO_IS_NULL=0
  Article Load (8.0ms)  SELECT `articles`.* FROM `articles` WHERE `articles`.`id` = 1 LIMIT 1
=> #<Article id: 1, title: "Test", body: "---\n- Est vel provident. Laboriosam dolor asperiore...", created_at: "2013-05-17 09:54:54", updated_at: "2013-05-21 14:52:18">
[2] pry(main)> require 'benchmark'
[3] pry(main)> puts Benchmark.measure { a.comments.destroy_all }
  Comment Load (896.0ms)  SELECT `comments`.* FROM `comments` WHERE `comments`.`article_id` = 1
  EXPLAIN (2.0ms)  EXPLAIN SELECT `comments`.* FROM `comments` WHERE `comments`.`article_id` = 1
EXPLAIN for: SELECT `comments`.* FROM `comments`  WHERE `comments`.`article_id` = 1
+----+-------------+----------+------+---------------+------------+---------+-------+-------+-------------+
| id | select_type | table    | type | possible_keys | key        | key_len | ref   | rows  | Extra       |
+----+-------------+----------+------+---------------+------------+---------+-------+-------+-------------+
|  1 | SIMPLE      | comments | ref  | article_id    | article_id | 5       | const | 48186 | Using where |
+----+-------------+----------+------+---------------+------------+---------+-------+-------+-------------+
1 row in set (0.00 sec)

  SQL (1.0ms)  DELETE FROM `comments` WHERE `comments`.`id` = 2
  SQL (2.0ms)  DELETE FROM `comments` WHERE `comments`.`id` = 3
  SQL (1.0ms)  DELETE FROM `comments` WHERE `comments`.`id` = 4
  SQL (1.0ms)  DELETE FROM `comments` WHERE `comments`.`id` = 5
  SQL (1.0ms)  DELETE FROM `comments` WHERE `comments`.`id` = 6
  SQL (5.0ms)  DELETE FROM `comments` WHERE `comments`.`id` = 7
  SQL (2.0ms)  DELETE FROM `comments` WHERE `comments`.`id` = 8
  SQL (2.0ms)  DELETE FROM `comments` WHERE `comments`.`id` = 9

. . .
  SQL (0.0ms)  DELETE FROM `comments` WHERE `comments`.`id` = 37360
  SQL (0.0ms)  DELETE FROM `comments` WHERE `comments`.`id` = 37361

The last query is deleting the last comment, and the process then hangs there for a very long time before finally returning and committing:

   (1.9ms)  COMMIT
690.380000   1.390000 691.770000 (693.885877)

SHOW PROCESSLIST confirms that there is no lock:

mysql> show processlist;
+----+----------+-----------+------------------+---------+------+-------+------------------+
| Id | User     | Host      | db               | Command | Time | State | Info             |
+----+----------+-----------+------------------+---------+------+-------+------------------+
|  6 | bloguser | localhost | blog_development | Query   |    0 | NULL  | show processlist |
|  7 | bloguser | localhost | blog_development | Sleep   |  459 |       | NULL             |
+----+----------+-----------+------------------+---------+------+-------+------------------+
2 rows in set (0.00 sec)

delete_all with dependent: :destroy or dependent: :delete_all show a very similar behaviour.

The popular belief seems to be that the issue with destroy_all is that it instantiates all the objects and deletes them one by one, but it doesn't look like it's the problem here. What is taking so long to process after all the DELETEs have been executed, and before COMMIT is finally called?

回答1:

Digging into this deeper, it seems it is the deletion from the comments array that takes very long. Records that are deleted are then removed from the array here.

Simulating this with a large array, we get the same slow behaviour:

1.9.3-p194 :001 > require 'benchmark'; require 'ostruct'
 => true 
1.9.3-p194 :002 > i = 0; a = []
 => [] 
1.9.3-p194 :003 > 35_000.times { i+=1; a << OpenStruct.new(value: i) }
 => 35000 
1.9.3-p194 :004 > puts Benchmark.measure { a.each { |i| a.delete(i) } }
623.560000   0.820000 624.380000 (625.244664)

ActiveRecord could probably be optimized to do a Array#clear in the case of a destroy_all...



回答2:

Beyond the fact that destroy_all instantiates all the rows first, this sounds like activerecord's after commit callback.

When you update/delete rows in a transaction, activerecord keeps track of all the rows you've modified, so that it can call any after commit hooks defined (even if there are none). In the past, I've found that this bookkeeping can be very slow when large numbers of records are involved (several thousand). This hit occurs just as rails is committing the transaction.

The slow bit if my memory is correct culprit is that rails call uniq on the array of changed objects. The details of how == and hash are implemented seem to make that slow in some cases

In the past, I've hobbled this via

class Foo  < ActiveRecord::Base
  #hobble commit hooks
  def add_to_transaction
  end
end

which will of course break commit callbacks (that you may not be using anyway)



回答3:

Be aware that #destroy_all instantiates EVERY instance of an object, then runs through and deletes it. This can take quite a while, and it's why you're getting all those different DELETE statements instead of a single one. What you probably want is delete_all:

Comment.delete_all("article_id = 1")

I know you mentioned the instantiation issue already, but give the two different methods a try side-by-side - I think you'll see a difference.

The important part above, though is the fact that you aren't doing it through the associations, note that the code I provided does NOT do:

Article.find(1).comments.delete_all

It invokes directly from the comments. This MAKES SURE you are not instantiating the objects. Calling delete_all through the association proxy can cause things to be instantiated. If they are instantiated, you usually get callbacks when you delete/destroy them - not to mention ruby having to shuffle around the objects in the collection in memory.

The cause of the time is ruby dealing with an array with 35k complex, associated objects in it. At the same time, notice the 35k delete statements. 35,000 Delete statements, wrapped in a transaction or not, are still going to take a long time.