Our app seems to semi-randomly hang at psynch_mutexwait. It seems to be related to a background process that updates a bunch of data stored in CoreData - but I've been completely unable to figure out just who is locking on what to cause the deadlock.
Following is the complete stack trace that lldb gives me - which is obviously incomplete, AND the last frame of Thread 1 is bogus. I had a breakpoint in that method a few lines before that, and it was never hit.
Is there ANY way of figuring out what lock is being waited on? (or even get correct stack traces?) Of course there is LOTS of code involved, which makes random NSLog statements a massive undertaking.
(lldb) bt all
* thread #1: tid = 0x2503, 0x39da20fc libsystem_kernel.dylib`__psynch_mutexwait + 24, stop reason = signal SIGSTOP
frame #0: 0x39da20fc libsystem_kernel.dylib`__psynch_mutexwait + 24
frame #1: 0x39ceb128 libsystem_c.dylib`pthread_mutex_lock + 392
frame #2: 0x00022068 OnDeck`-[AttendanceWorkoutsController buildTable](self=0x00000003, _cmd=0x00000000) + 508 at AttendanceWorkoutsController.m:100
thread #2: tid = 0x2803, 0x39d92648 libsystem_kernel.dylib`kevent64 + 24
frame #0: 0x39d92648 libsystem_kernel.dylib`kevent64 + 24
frame #1: 0x39ccb4f0 libdispatch.dylib`_dispatch_mgr_invoke + 796
thread #5: tid = 0x2b03, 0x39d91eb4 libsystem_kernel.dylib`mach_msg_trap + 20
frame #0: 0x39d91eb4 libsystem_kernel.dylib`mach_msg_trap + 20
frame #1: 0x39d9204c libsystem_kernel.dylib`mach_msg + 40
thread #6: tid = 0x242f, 0x39d91eb4 libsystem_kernel.dylib`mach_msg_trap + 20
frame #0: 0x39d91eb4 libsystem_kernel.dylib`mach_msg_trap + 20
frame #1: 0x39d9204c libsystem_kernel.dylib`mach_msg + 40
thread #7: tid = 0x2c03, 0x39da2594 libsystem_kernel.dylib`select$DARWIN_EXTSN + 20
frame #0: 0x39da2594 libsystem_kernel.dylib`select$DARWIN_EXTSN + 20
frame #1: 0x31bff1f6 CoreFoundation`__CFSocketManager + 678
thread #8: tid = 0x2d03, 0x39da2d98 libsystem_kernel.dylib`__workq_kernreturn + 8
frame #0: 0x39da2d98 libsystem_kernel.dylib`__workq_kernreturn + 8
frame #1: 0x39cf0cfa libsystem_c.dylib`_pthread_workq_return + 18
(lldb)
This usually happens when one tries to access Core Data objects on a background thread using the main-threads context OR using the same managed object context on different threads (background or main) at the same time. For more details check out Core Data concurrency rules.
So to avoid both cases, the main rule is, each thread must have its own managed object context and initialize that context exactly where it's going to be used.
For example:
By having several people look at the code, and trace through the long complicated code paths, we found what appears to have been the culprit. One method running in a background thread was finding and using some Core Data objects and using the main-threads context.
Sure would have helped a LOT if IOS would give useful stack traces.
This has seen when a related entity in another context (and on another thread) has been modified but not yet persisted.
The scenario:
A --> B
Due to a bug
B
had pending changes, in another context, on another thread. The Bug causedB
to hang around instead of saving or rolling it back. Attempting to saveA
in the current context/thread will will cause the wait for the other thread to release the lock onB
.Only successful way to trouble shoot was to list all pending entities and compare to ones in the blocked thread. Took a while :(
I am still looking for something that list all locks on the database and entities.