MySQL Update query - Will the 'where' cond

2020-06-16 04:36发布

问题:

I am trying to build a first-come first-get model sale page. We have n number of items of the same type. We want to assign these n items to the first n users who made the request. Corresponding to each item, there is a database row. When the user presses buy button, the system tries to find an entry which is not yet sold (reservationCompleted = FALSE) and updates the user id and sets reservationCompleted to true.

Since the database engine I am using is InnoDB, I understand that there is an internal locking mechanism that won't allow two processes to do an update at once on the same row.

My question is,

if the statement I am using is the following, will this result in different users getting assigned to the same row if two requests arrive at the same time?

$query = "UPDATE available_items
    SET assignedPhone=".$user->phone.",
        reservationCompleted = TRUE,
        assignmentCreatedTimestamp =".time()."
    WHERE id=".$itemListing['id']."
    AND reservationCompleted=FALSE";
$stmt = $pdo->prepare($query);
$stmt->execute();

Consider the following case.

Two different processes gets the same row (say id=5) and tries to update the DB entry. But one of them gets the lock. It updates the item and releases the lock and the next process gets the lock. So,will it again validate the where condition before performing the update?

回答1:

The where condition will be respected during a race situation, but you must be careful how you check to see who won the race.

Consider the following demonstration of how this works and why you have to be careful.

First, setup some minimal tables.

CREATE TABLE table1 (
`id` TINYINT UNSIGNED NOT NULL PRIMARY KEY,
`locked` TINYINT UNSIGNED NOT NULL,
`updated_by_connection_id` TINYINT UNSIGNED DEFAULT NULL
) ENGINE = InnoDB;

CREATE TABLE table2 (
`id` TINYINT UNSIGNED NOT NULL PRIMARY KEY
) ENGINE = InnoDB;

INSERT INTO table1
(`id`,`locked`)
VALUES
(1,0);

id plays the role of id in your table, updated_by_connection_id acts like assignedPhone, and locked like reservationCompleted.

Now lets start the race test. You should have 2 commandline/terminal windows open, connected to mysql and using the database where you have created these tables.

Connection 1

start transaction;

Connection 2

start transaction;

Connection 1

UPDATE table1
SET locked = 1,
updated_by_connection_id = 1
WHERE id = 1
AND locked = 0;

Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0

Connection 2

UPDATE table1
SET locked = 1,
updated_by_connection_id = 2
WHERE id = 1
AND locked = 0;

Connection 2 is now waiting

Connection 1

SELECT * FROM table1 WHERE id = 1;
+----+--------+--------------------------+
| id | locked | updated_by_connection_id |
+----+--------+--------------------------+
|  1 |      1 |                        1 |
+----+--------+--------------------------+
commit;

At this point, connection 2 is released to continue and outputs the following:

Connection 2

Query OK, 0 rows affected (23.25 sec) Rows matched: 0 Changed: 0 Warnings: 0

SELECT * FROM table1 WHERE id = 1;
+----+--------+--------------------------+
| id | locked | updated_by_connection_id |
+----+--------+--------------------------+
|  1 |      1 |                        1 |
+----+--------+--------------------------+
commit;

Everything looks fine. We see that yes, the WHERE clause was respected in a race situation.

The reason I said you had to be careful though, is because in a real application things are not always this simple. You MAY have other actions going on within the transaction, and that can actually change the results.

Let's reset the database with the following:

delete from table1;
INSERT INTO table1
(`id`,`locked`)
VALUES
(1,0);

And now, consider this situation, where a SELECT is performed before the UPDATE.

Connection 1

start transaction;

SELECT * FROM table2;

Empty set (0.00 sec)

Connection 2

start transaction;

SELECT * FROM table2;

Empty set (0.00 sec)

Connection 1

UPDATE table1
SET locked = 1,
updated_by_connection_id = 1
WHERE id = 1
AND locked = 0;

Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0

Connection 2

UPDATE table1
SET locked = 1,
updated_by_connection_id = 2
WHERE id = 1
AND locked = 0;

Connection 2 is now waiting

Connection 1

SELECT * FROM table1 WHERE id = 1;
+----+--------+--------------------------+
| id | locked | updated_by_connection_id |
+----+--------+--------------------------+
|  1 |      1 |                        1 |
+----+--------+--------------------------+
1 row in set (0.00 sec)
SELECT * FROM table1 WHERE id = 1 FOR UPDATE;
+----+--------+--------------------------+
| id | locked | updated_by_connection_id |
+----+--------+--------------------------+
|  1 |      1 |                        1 |
+----+--------+--------------------------+
1 row in set (0.00 sec)
commit;

At this point, connection 2 is released to continue and outputs the following:

Query OK, 0 rows affected (20.47 sec) Rows matched: 0 Changed: 0 Warnings: 0

Ok, let's see who won:

Connection 2

SELECT * FROM table1 WHERE id = 1;
+----+--------+--------------------------+
| id | locked | updated_by_connection_id |
+----+--------+--------------------------+
|  1 |      0 |                     NULL |
+----+--------+--------------------------+

Wait, what? Why is locked 0 and updated_by_connection_id NULL??

This is the being careful I mentioned. The culprit is actually due to the fact that we did a select at the beginning. To get the correct result, we could run the following:

SELECT * FROM table1 WHERE id = 1 FOR UPDATE;
+----+--------+--------------------------+
| id | locked | updated_by_connection_id |
+----+--------+--------------------------+
|  1 |      1 |                        1 |
+----+--------+--------------------------+
commit;

By using SELECT ... FOR UPDATE we can get the right result. This can be very confusing (as it was for me, originally), as a SELECT and a SELECT ... FOR UPDATE are giving two different results.

The reason this happens is because of the default isolation level READ-REPEATABLE. When the first SELECT is made, right after the start transaction;, a snapshot is created. All future non-updating reads will be done from that snapshot.

Therefore, if you just naively SELECT after you do the update, it will pull the information from that original snapshot, which is before the row has been updated. By doing a SELECT ... FOR UPDATE you force it to get the correct information.

However, again, in a real application this could be an issue. Say, for example, your request is wrapped in a transaction, and after performing the update you want to output some information. Collecting and output that information may be handled by separate, reusable code, that you DON'T want to litter with FOR UPDATE clauses "just in case." That would lead to lots of frustration due to unnecessary locking.

Instead, you'll want to take a different track. You have many options here.

One, is to make sure you commit the transaction after the UPDATE has completed. In most case, this is probably the best, simplest choice.

Another option is to not try using SELECT to determine the result. Instead, you may be able to read the rows affected, and use that (1 row updated vs 0 rows update) to determine if the UPDATE was a success.

Another option, and one that I use frequently, as I like to keep a single request (like an HTTP request) fully wrapped in a single transaction, is to make sure that the first statement executed in a transaction is either the UPDATE or a SELECT ... FOR UPDATE. That will cause the snapshot to NOT be taken until the connection is allowed to proceed.

Let's reset our test database again and see how this works.

delete from table1;
INSERT INTO table1
(`id`,`locked`)
VALUES
(1,0);

Connection 1

start transaction;

SELECT * FROM table1 WHERE id = 1 FOR UPDATE;
+----+--------+--------------------------+
| id | locked | updated_by_connection_id |
+----+--------+--------------------------+
|  1 |      0 |                     NULL |
+----+--------+--------------------------+

Connection 2

start transaction;

SELECT * FROM table1 WHERE id = 1 FOR UPDATE;

Connection 2 is now waiting.

Connection 1

UPDATE table1
SET locked = 1,
updated_by_connection_id = 1
WHERE id = 1
AND locked = 0;

Query OK, 1 row affected (0.01 sec) Rows matched: 1 Changed: 1 Warnings: 0

SELECT * FROM table1 WHERE id = 1;
+----+--------+--------------------------+
| id | locked | updated_by_connection_id |
+----+--------+--------------------------+
|  1 |      1 |                        1 |
+----+--------+--------------------------+
SELECT * FROM table1 WHERE id = 1 FOR UPDATE;
+----+--------+--------------------------+
| id | locked | updated_by_connection_id |
+----+--------+--------------------------+
|  1 |      1 |                        1 |
+----+--------+--------------------------+
commit;

Connection 2 is now released.

Connection 2

+----+--------+--------------------------+
| id | locked | updated_by_connection_id |
+----+--------+--------------------------+
|  1 |      1 |                        1 |
+----+--------+--------------------------+

Here you could actually have your server side code check the results of this SELECT and know it is accurate, and not even continue with the next steps. But, for completeness, I'll finish as before.

UPDATE table1
SET locked = 1,
updated_by_connection_id = 2
WHERE id = 1
AND locked = 0;

Query OK, 0 rows affected (0.00 sec) Rows matched: 0 Changed: 0 Warnings: 0

SELECT * FROM table1 WHERE id = 1;
+----+--------+--------------------------+
| id | locked | updated_by_connection_id |
+----+--------+--------------------------+
|  1 |      1 |                        1 |
+----+--------+--------------------------+
SELECT * FROM table1 WHERE id = 1 FOR UPDATE;
+----+--------+--------------------------+
| id | locked | updated_by_connection_id |
+----+--------+--------------------------+
|  1 |      1 |                        1 |
+----+--------+--------------------------+
commit;

Now you can see that in Connection 2 the SELECT and SELECT ... FOR UPDATE give the same result. This is because the snapshot that the SELECT reads from was not created until after Connection 1 had been committed.

So, back to your original question: Yes, the WHERE clause is checked by the UPDATE statement, in all cases. However, you have to be careful with any SELECTs you may be doing, to avoid incorrectly determining the result of that UPDATE.

(Yes another option is to change the transaction isolation level. However, I don't really have experience with that and any gotchya's that might exist, so I'm not going to go into it.)



回答2:

The answer is: it will validate the WHERE condition before updating the data.


Well, I have to say that this is a very interesting question. I've never thought about such a question before, and it enforces me to have a better understanding of how it works inside MySQL. Thank you!

How I get the answer:

I did my test for this situation at first. I know it should work like this even before I did my test, but I just didn't understand why.

Why:

Finally, I found something useful in the Index Condition Pushdown section.

This is how it works inside MySQL:

MySQL Server
   ↑  ↑
   ↓  ↓
Storage Engine(InnoDB here)
  1. MySQL Server parses SQL statements coming from external applications.
  2. MySQL Server tells InnoDB to retrieve rows for it.
  3. InnoDB locates rows(Index, Locking), then return them to MySQL Server.
  4. MySQL Server evaluates WHERE conditions for rows.
  5. Some other things ...

As you can see, locking occurs inside InnoDB, and MySQL Server evaluates the WHERE condition after obtaining the rows. For your situation, the row(id = 5) is locked by the first UPDATE, and the second UPDATE gets stuck when fetching the same row. And the evaluation for the second UPDATE's WHERE condition occurs after obtaining the lock for the row.

What's more, if you have created an index on id, Index Condition Pushdown will take place in your query.



回答3:

No since reservationCompleted is set to true true.Don t forget to COMMIT each successfull transaction. the next process will of course get the lock but will not satisfiy the WHERE condition and release the LOCK.If you want you want the next process to look for another available item You can wrap your Update statement around with a Sub Routine procedure to check whether reservationCompleted is FALSE.