Sqlite deadlock across multiple threads with diffe

2019-05-22 02:22发布

问题:

I've got an OS X app (Yosemite, 10.10) that performs long-running jobs involving heavy use of sqlite across multiple threads. And I've hit a deadlock across 8 threads, all of which are stuck in sqlite code connecting to different database files. There is no apparent resource-related connection between them. I'm debugging it on a new Mac Pro (late 2013).

Four of them are at this stack. Of those, three are operating on the same table (again, different database files); three are updating and one is querying.

__psynch_mutexwait
_pthread_mutex_lock
unixLock
sqlite3PagerSharedLock
sqlite3BtreeBeginTrans
sqlite3VdbeExec
sqlite3_step

One is at this stack, updating the same table as three of the ones at the stack above.

guarded_close_np
nolockClose
pager_end_transaction
sqlite3BtreeCommitPhaseTwo
sqlite3VdbeHalt
sqlite3VdbeExec
sqlite3_step

Two are at this stack, opening database files with the same name in different locations. The open mode is SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE | SQLITE_OPEN_FULLMUTEX.

__psynch_mutexwait
_pthread_mutex_lock
sqlite3ParseUri
openDatabase

One is at this stack, ending a transaction.

__psynch_mutexwait
_pthread_mutex_lock
unixLock
sqlite3VdbeHalt
sqlite3VdbeExec
sqlite3_step

So, the question: what could cause sqlite to deadlock without involving any shared resources?

Update: I've now got seven threads locked calling sqlite3_open_v2 and one on sqlite3_close, all on different database files (several with the same name, but in different folders). The stack is:

__psynch_mutexwait
_pthread_mutex_lock
sqlite3ParseUri
openDatabase

The close stack is:

guarded_close_np
unixClose
sqlite3PagerClose
sqlite3BtreeClose
sqlite3LeaveMutexAndCloseZombie
sqlite3Close

I was able to get it to go longer before locking up by fixing some memory leaks (this isn't running with ARC) and removing some transaction statements.

Update 2: I've hooked up SQLITE_LOG via sqlite3_config (documentation) and I'm seeing a fair amount of logging of code 28 (sqlite_warning) with the message "file renamed while open:".

Update 3: I wiped the machine and did a fresh install of Yosemite in an attempt to rule out file system issues. I'm still locking up the same way. It'll run for several minutes and then one by one the threads lock up. There's one at guarded_close_np, stuck in assembly at an instruction jae <address here> where the address being jumped to has the instruction retq. I've asked a separate question about the sqlite log message about files being renamed in hopes that it's related.

回答1:

It sounds like you're getting stuck on the UNIX master mutex, which needs to be acquired before file closing:

/*
** Close a file.
*/
static int unixClose(sqlite3_file *id){  
  int rc = SQLITE_OK;
  unixFile *pFile = (unixFile *)id;
  verifyDbFile(pFile);
  unixUnlock(id, NO_LOCK);
  unixEnterMutex(); <- HERE
...

This mutex is mostly held during low-level file operations. You'll have to find the thread that holds the mutex and see what it's doing. Perhaps it's operating on a slow or broken filesystem.



回答2:

My app was using sqlite by linking against the dynamic library. I downloaded the latest SQLite source amalgamation from here (3.8.7.4 as of writing) and compiled it right into the app, and everything started working. So perhaps it was a bug in 3.8.5. Apparently compiling the source into the app directly is the recommended way to use sqlite anyway.

I still don't know exactly what caused the issue. The only thing I can think of is that it's something to do with how I'm creating the database files: I'm creating an empty file using NSFileManager createFileAtPath and then passing it to sqlite3_open_v2 with SQLITE_OPEN_CREATE as part of the flags parameter. So it's writing a database into an existing file instead of creating a database file at the specified location.