SQL Simultaneous transactions ignore each other

2019-08-19 08:17发布

问题:

Good day!

I've run into a head burner. My client requires me to repurpose a python program to work with MySQL instead of Microsoft's SQL Server. I'm having trouble finding an equivalent solution in SQL.

I can't seem to create a proper update lock on a row. When two identical transactions execute simultaneously, they both read the row despite opening a transaction at the serialize isolation level, and with SELECT ... FOR UPDATE.

Maybe my code will explain it better:

execute("SET TRANSACTION ISOLATION LEVEL SERIALIZABLE")
execute("START TRANSACTION")
execute("SELECT * FROM job WHERE status = %s LIMIT 1 FOR UPDATE", jobStatus.imported)
job_data = cursor.fetchone()
if not job_data:
    connection.rollback()
else:
    execute("UPDATE job SET status = %s WHERE jobID = %s", jobStatus.ingesting, job_data['jobID']) # Update the job data

    if job_data['jobUUID'] == None:
        job_data['jobUUID'] = new_unused_uuid().bytes
        execute("UPDATE job SET jobUUID = %s WHERE jobID = %s LIMIT 1", job_data['jobUUID'], job_data['jobID'])
    if job_data['dateAdded'] == None:
        job_data['dateAdded'] = datetime.datetime.now()
        execute("UPDATE job SET dateAdded = %s WHERE jobID = %s LIMIT 1", job_data['dateAdded'], job_data['jobID'])

    execute("INSERT INTO ingestJob (fk_jobUUID, fk_nodeUUID, status) VALUES (%s, %s, %s) ON DUPLICATE KEY UPDATE fk_nodeUUID = %s, status = %s", job_data['jobUUID'], unique_id.bytes, smallJobStatus.running, unique_id.bytes, smallJobStatus.running)

    connection.commit()

The process is as follows:

  1. Select one possible job, with FOR UPDATE
  2. Rollback (release the lock) if no jobs, or...
  3. ...update the row so it can't be reselected, make some unrelated changes
  4. Commit

They both do their own thing, ignoring each other's lock and transaction

What scares me is that it's random. It happens roughly every other run. When trying the same queries in an isolated environment, with sufficient delay, I get the exact result I want.
As soon as SELECT ... FOR UPDATE is called by Alice, Barry can't read the row, and hangs until Alice commits or rolls back. My phenomenon requires exact simultaneous execution between the two instances of the same program.

I tried printing the fetched row on line 4, they return the exact same row... I'm using MariaDB 10.1.30 with the InnoDB engine on Ubuntu Server, with Python and the MySQLdb (mysqlclient) module for communication. Is it Maria? I thought it might be the superior alternative compared to MySQL.

One raises an exception, because it's fighting over resources with the other (too slow noob!)

The transaction and lock are being made

To show that a FOR UPDATE lock and a proper transaction are made, I did the following test. I ran this little poke-script at the same time, while adding time.sleep(10) just before the commit on the main script, to keep the lock active for at least 10 seconds.

while True:
    cursor.execute("SELECT * from job FOR UPDATE")
    print('Selected')
    time.sleep(1)
    connection.rollback()
    print('Released')
    time.sleep(1)

As soon as the main scripts acquire the lock, the small poke script would hang, not being able to select the row. After ten seconds, the poke script acquires the lock, but both nodes executed, again!!!. As you can see, the one on the top complains about a deadlock, because the bottom one already inserted a row somewhere else in the transaction.

I'm open to other more-correct SQL solutions. Maybe I'm doing it wrong. In T-SQL, it was possible to UPDATE a row and return the modified row with the OUTPUT clause, as if a SELECT statement had been run after the UPDATE. My only solution is to SELECT a row with FOR UPDATE, and then run the UPDATE. I haven't really thought about using procedures, would it be better to take it away from Python and run native on MariaDB?

I would really appreciate any hints or suggestions. I'm not that experienced with SQL, but the move away from SQL Server has been particularly punishing. As my client wishes to use dockers, I fear that this may not just be an unlikely scenario, but a possibility, as dockers might be created simultaneously when an extreme load is generated.

Thanks, and have a great day!

回答1:

(This may not address your question, but it is a suggestion that won't fit in a Comment.)

Do all the updates at once:

UPDATE job SET
        status = %s,
        jobUUID   = IFNULL(jobUUID, UUID()),
        dateAdded = IFNULL(dateAdded, NOW())
    WHERE jobID = %s
    LIMIT 1

You may be able to get jobID by using LAST_INSERT_ID(jobID), thereby avoiding the SELECT.



回答2:

SELECT... FOR UPDATE isolates transactions at different levels depending on your configuration. You can find more information https://dev.mysql.com/doc/refman/5.5/en/innodb-transaction-isolation-levels.html#isolevel_repeatable-read here.

But the most important thing in your code is that you have to be using DIFFERENT sessions for different transactions.

As stated here https://dev.mysql.com/doc/refman/5.7/en/lock-tables-and-transactions.html. If you run another transaction in the same session before a transaction is finished, it will be implicitly committed and it's what is leading to random results in your code.

What you are doing right now in your code is same as running two transactions in one terminal which would be not much different to just running everything in non-transaction.

You would need to create transactions in different connections through methods such as pooling in order to simulate different sessions.



回答3:

After adding some time.sleep statements in different parts of the transaction, I realized that the problem has got nothing to do with Alice and Barry executing simultaneously or ignoring each other's locks.

Without the sleep statements, it was too fast to see what was going on. The real issue is that Barry reads OLD data in his SELECT... FOR UPDATE, even after Alice's COMMIT which updates the job status, leaving him to take up the same job immediately after Alice releases the lock.

As this is a completely different issue, I've reposted the question with a different explanation and more relevant code samples here: SELECT... FOR UPDATE selecting old data after a commit

I'm sorry this couldn't help you. I've yet to find the problem myself.