android NDK mutex locking

2020-07-16 09:22发布

问题:

I've been porting a cross platform C++ engine to Android, and noticed that it will inexplicably (and inconsistently) block when calling pthread_mutex_lock. This engine has already been working for many years on several platforms, and the problematic code hasn't changed in years, so I doubt it's a deadlock or otherwise buggy code. It must be my port to Android..

So far there are several places in the code that block on pthread_mutex_lock. It isn't entirely reproducible either. When it hangs, there's no suspicious output in LogCat.

I modified the mutex code like this (edited for brevity... real code checks all return values):

void MutexCreate( Mutex* m )
{
#ifdef WINDOWS
    InitializeCriticalSection( m );
#else ANDROID
    pthread_mutex_init( m, NULL );
#endif
}


void MutexDestroy( Mutex* m )
{
#ifdef WINDOWS
    DeleteCriticalSection( m );
#else ANDROID
    pthread_mutex_destroy( m, NULL );
#endif
}

void MutexLock( Mutex* m )
{
#ifdef WINDOWS
    EnterCriticalSection( m );
#else ANDROID
    pthread_mutex_lock( m );
#endif
}

void MutexUnlock( Mutex* m )
{
#ifdef WINDOWS
    LeaveCriticalSection( m );
#else ANDROID
    pthread_mutex_unlock( m );
#endif
}

I tried modifying MutexCreate to make error-checking and recursive mutexes, but it didn't matter. I wasn't even getting errors or log output either, so either that means my mutex code is just fine, or the errors/logs weren't being shown. How exactly does the OS notify you of bad mutex usage?

The engine makes heavy use of static variables, including mutexes. I can't see how, but is that a problem? I doubt it because I modified lots of mutexes to be allocated on the heap instead, and the same behavior occurred. But that may be because I missed some static mutexes. I'm probably grasping at straws here.

I read several references including:

http://pubs.opengroup.org/onlinepubs/7908799/xsh/pthread_mutex_init.html

http://www.embedded-linux.co.uk/tutorial/mutex_mutandis

http://linux.die.net/man/3/pthread_mutex_init

Android NDK Mutex

Android NDK problem pthread_mutex_unlock issue

回答1:

The "errorcheck" mutexes will check a couple of things (like attempts to use a non-recursive mutex recursively) but nothing spectacular.

You said "real code checks all return values", so presumably your code explodes if any pthread call returns a nonzero value. (Not sure why your pthread_mutex_destroy takes two args; assuming copy & paste error.)

The pthread code is widely used within Android and has no known hangups, so the issue is not likely in the pthread implementation itself.

The current implementation of mutexes fits in 32 bits, so if you print *(pthread_mutex_t* mut) as an integer you should be able to figure out what state it's in (technically, what state it was in at some point in the past). The definition in bionic/libc/bionic/pthread.c is:

/* a mutex is implemented as a 32-bit integer holding the following fields
 *
 * bits:     name     description
 * 31-16     tid      owner thread's kernel id (recursive and errorcheck only)
 * 15-14     type     mutex type
 * 13        shared   process-shared flag
 * 12-2      counter  counter of recursive mutexes
 * 1-0       state    lock state (0, 1 or 2)
 */

"Fast" mutexes have a type of 0, and don't set the tid field. In fact, a generic mutex will have a value of 0 (not held), 1 (held), or 2 (held, with contention). If you ever see a fast mutex whose value is not one of those, chances are something came along and stomped on it.

It also means that, if you configure your program to use recursive mutexes, you can see which thread holds the mutex by pulling the bits out (either by printing the mutex value when trylock indicates you're about to stall, or dumping state with gdb on a hung process). That, plus the output of ps -t, will let you know if the thread that locked the mutex still exists.