MySQL: Why is DELETE more CPU intensive than INSER

2019-06-17 05:26发布

问题:

I'm currently taking the course "Performance Evaluation" at university, and we're now doing an assignment where we are testing the CPU usage on a PHP and MySQL-database server. We use httperf to create custom traffic, and vmstat to track the server load. We are running 3000 connections to the PHP-server, for both INSERT and DELETE (run separately).

Numbers show that the DELETE operation is a lot more CPU intensive than INSERT — and I'm just wondering why?

I initially thought INSERT required more CPU usage, as indexes would need to be recreated, data needed to be written to disk, etc. But obviously I'm wrong, and I'm wondering if anyone can tell me the technical reason for this.

回答1:

At least with InnoDB (and I hope they have you on this), you have more operations even with no foreign keys. An insert is roughly this:

  1. Insert row
  2. Mark in binary log buffer
  3. Mark commit

Deletions do the following:

  1. Mark row removed (taking the same hit as an insertion -- page is rewritten)
  2. Mark in binary log buffer
  3. Mark committed
  4. Actually go remove the row, (taking the same hit as an insertion -- page is rewritten)
  5. Purge thread tracks deletions in binary log buffer too.

For that, you've got twice the work going on to delete rather than insert. A delete requires those two writes because it must be marked as removed for all versions going forward, but can only be removed when no transactions remain which see it. Because InnoDB only writes full blocks, to the disk, the modification penalty for a block is constant.



回答2:

DELETE also requires data to be written to disk, plus recalculation of indexes, and in addition, a set of logical comparisons to find the record(s) you are trying to delete in the first place.



回答3:

Delete requires more logic than you think; how much so depends on the structure of the schema.

In almost all cases, when deleting a record, the server must check for any dependencies upon that record as a foreign key reference. That, in a nutshell, is a query of the system tables looking for table definitions with a foreign key ref to this table, then a select of each of those tables for records referencing the record to be deleted. Right there you've increased the computational time by a couple orders of magnitude, regardless of whether the server does cascading deletes or just throws back an error.

Self-balancing internal data structures would also have to be reorganized, and indexes would have to be updated to remove any now-empty branches of the index trees, but these would have counterparts in the Insert operations.