I've got an OS X app (Yosemite, 10.10) that performs long-running jobs involving heavy use of sqlite across multiple threads. And I've hit a deadlock across 8 threads, all of which are stuck in sqlite code connecting to different database files. There is no apparent resource-related connection between them. I'm debugging it on a new Mac Pro (late 2013).
Four of them are at this stack. Of those, three are operating on the same table (again, different database files); three are updating and one is querying.
__psynch_mutexwait
_pthread_mutex_lock
unixLock
sqlite3PagerSharedLock
sqlite3BtreeBeginTrans
sqlite3VdbeExec
sqlite3_step
One is at this stack, updating the same table as three of the ones at the stack above.
guarded_close_np
nolockClose
pager_end_transaction
sqlite3BtreeCommitPhaseTwo
sqlite3VdbeHalt
sqlite3VdbeExec
sqlite3_step
Two are at this stack, opening database files with the same name in different locations. The open mode is SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE | SQLITE_OPEN_FULLMUTEX
.
__psynch_mutexwait
_pthread_mutex_lock
sqlite3ParseUri
openDatabase
One is at this stack, ending a transaction.
__psynch_mutexwait
_pthread_mutex_lock
unixLock
sqlite3VdbeHalt
sqlite3VdbeExec
sqlite3_step
So, the question: what could cause sqlite to deadlock without involving any shared resources?
Update: I've now got seven threads locked calling sqlite3_open_v2
and one on sqlite3_close
, all on different database files (several with the same name, but in different folders). The stack is:
__psynch_mutexwait
_pthread_mutex_lock
sqlite3ParseUri
openDatabase
The close stack is:
guarded_close_np
unixClose
sqlite3PagerClose
sqlite3BtreeClose
sqlite3LeaveMutexAndCloseZombie
sqlite3Close
I was able to get it to go longer before locking up by fixing some memory leaks (this isn't running with ARC) and removing some transaction statements.
Update 2: I've hooked up SQLITE_LOG
via sqlite3_config
(documentation) and I'm seeing a fair amount of logging of code 28 (sqlite_warning) with the message "file renamed while open:".
Update 3: I wiped the machine and did a fresh install of Yosemite in an attempt to rule out file system issues. I'm still locking up the same way. It'll run for several minutes and then one by one the threads lock up. There's one at guarded_close_np
, stuck in assembly at an instruction jae <address here>
where the address being jumped to has the instruction retq
. I've asked a separate question about the sqlite log message about files being renamed in hopes that it's related.