Detached pthreads and memory leak

2019-03-15 13:08发布

站内文章 / C++

9 0

冷血范

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Can somebody please explain to me why this simple code leaks memory?

I believe that since pthreads are created with detached state their resources should be released inmediatly after it's termination, but it's not the case.

My environment is Qt5.2.

#include <QCoreApplication>
#include <windows.h>

void *threadFunc( void *arg )
    {
    printf("#");
    pthread_exit(NULL);
    }

int main()
    {
    pthread_t thread;
    pthread_attr_t attr;

    while(1)
        {
        printf("\nStarting threads...\n");
        for(int idx=0;idx<100;idx++)
            {
            pthread_attr_init(&attr);
            pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
            pthread_create( &thread, &attr, &threadFunc, NULL);
            pthread_attr_destroy ( &attr );
            }
        printf("\nSleeping 10 seconds...\n");
        Sleep(10000);
        }
    }

UPDATE:

I discovered that if I add a slight delay of 5 milliseconds inside the for loop the leak is WAY slower:

    for(int idx=0;idx<100;idx++)
        {
        pthread_attr_init(&attr);
        pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
        pthread_create( &thread, &attr, &threadFunc, NULL);
        pthread_attr_destroy ( &attr );
        Sleep(5); /// <--- 5 MILLISECONDS DELAY ///
        }

This is freaking me out, could somebody please tell me what is happening? How this slight delay may produce such a significant change? (or alter the behavior in any way)

Any advice would be greatly appreciated.

Thanks.

UPDATE2:

This leak was observed on Windows platforms (W7 and XP), no leak was observed on Linux platforms (thank you @MichaelGoren)

回答1:

I checked the program with slight modifications on windows using cygwin, and memory consumption was steady. So it must be a qt issue; the pthread library on cygwin works fine without leaking.

#include <pthread.h>
#include <stdio.h>
#include <unistd.h>


void *threadFunc( void *arg )
{
printf("#");
pthread_exit(NULL);
}

int main()
{
pthread_t thread;
pthread_attr_t attr;
int idx;

while(1)
    {
    printf("\nStarting threads...\n");
    for(idx=0;idx<100;idx++)
        {
        pthread_attr_init(&attr);
        pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
        pthread_create( &thread, &attr, &threadFunc, NULL);
        pthread_attr_destroy ( &attr );
        }
    printf("\nSleeping 10 seconds...\n");
    //Sleep(10000);
sleep(10);
    }
}

回答2:

Compiler optimizations or the OS it self can decide to do loop unrolling. That is your for loop has a constant bound (100 here). Since there is no explicit synchronization to prevent it, a newly created, detached thread can die and have its thread ID reassigned to another new thread before its creator returns from pthread_create() due to this unrolling. The next iteration is already started before the thread was actually destroyed.

This also explains why your added slight delay has less issues; one iteration takes longer and hence the thread functions can actually finish in more cases and hence the threads are actually terminated most of the time.

A possible fix would be to disable compiler optimizations, or add synchronization; that is, you check whether the thread still exist, at the end of the code, if it does you'll have to wait for the function to finish. A more tricky way would be to use mutexes; you let the thread claim a resource at creation and by definition of PTHREAD_CREATE_DETACHED this resource is automatically released when the thread is exited, hence you can use try_lock to test whether the thread is actually finished. Note that I haven't tested this approach so I'm not actually sure whether PTHREAD_CREATE_DETACHED actually is working according to its definition...

Concept:

pthread_mutex_t mutex;

void *threadFunc( void *arg )
{
  printf("#");
  pthread_mutex_lock(&mutex);
  pthread_exit(NULL);
}

for(int idx=0;idx<100;idx++)
{
  pthread_attr_init(&attr);
  pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
  pthread_create( &thread, &attr, &threadFunc, NULL);
  pthread_attr_destroy ( &attr );
  pthread_mutex_lock(&mutex); //will block untill "destroy" has released the mutex
  pthread_mutex_unlock(&mutex);
}

回答3:

The delay can induce a large change in behavior because it gives the thread time to exit! Of course how your pthread library is implemented is also a factor here. I suspect it is using a 'free list' optimization.

If you create 1000 threads all at once, then the library allocates memory for them all before any significant number of those threads can exit.

If as in your second code sample you let the previous thread run and probably exit before you start a new thread, then your thread library can reuse that thread's allocated memory or data structures which it now knows are no longer needed and it is now probably holding in a free list just in case someone creates a thread again and it can efficiently recycle the memory.

回答4:

It has nothing to do with compiler optimisations. Code is fine. Problem could be a) Windows itself. b) Qt implementation of pthread_create() with detached attributes

Checking for (a): Try to create many fast detached threads using Windows _beginthreadex directly and see if you get the same picture. Note: CloseHandle(thread_handle) as soon as _beginthreaex returns to make it detached.

Checking for (b): Trace which function Qt uses to create threads. If it is _beginthread then there is your answer. If it is _beginthreadex, then Qt is doing the right thing and you need to check if Qt closes the thread handle handle immediately. If it does not then that is the cause.

cheers

UPDATE 2

Qt5.2.0 does not provide pthreads API and is unlikely responsible for the observed leak.

I wrapped native windows api to see how the code runs without pthread library. You can include this fragment right after includes:

#include <process.h>
#define PTHREAD_CREATE_JOINABLE 0
#define PTHREAD_CREATE_DETACHED 1

typedef struct { int detachstate; } pthread_attr_t;

typedef HANDLE pthread_t;

_declspec(noreturn) void pthread_exit(void *retval)
{
    static_assert(sizeof(unsigned) == sizeof(void*), "Modify code");
    _endthreadex((unsigned)retval);
}

int pthread_attr_setdetachstate(pthread_attr_t *attr, int detachstate)
{
    attr->detachstate = detachstate;
    return 0;
}

int pthread_attr_init(pthread_attr_t *attr)
{
    attr->detachstate = PTHREAD_CREATE_JOINABLE;
    return 0;
}

int pthread_attr_destroy(pthread_attr_t *attr)
{
    (void)attr;
    return 0;
}

typedef struct {
    void *(*start_routine)(void *arg);
    void   *arg;
} winapi_caller_args;

unsigned __stdcall winapi_caller(void *arglist)
{
    winapi_caller_args *list = (winapi_caller_args *)arglist;
    void             *(*start_routine)(void *arg) = list->start_routine;
    void               *arg                       = list->arg;

    free(list);
    static_assert(sizeof(unsigned) == sizeof(void*), "Modify code");
    return (unsigned)start_routine(arg);
}

int pthread_create( pthread_t *thread, pthread_attr_t *attr,
                    void *(*start_routine)(void *), void *arg)
{

    winapi_caller_args *list;

    list = (winapi_caller_args *)malloc(sizeof *list);
    if (list == NULL)
        return EAGAIN;

    list->start_routine = start_routine;
    list->arg = arg;
    *thread = (HANDLE)_beginthreadex(NULL, 0, winapi_caller, list, 0, NULL);
    if (*thread == 0) {
        free(list);
        return errno;
    }
    if (attr->detachstate == PTHREAD_CREATE_DETACHED)
        CloseHandle(*thread);

    return 0;
}

With Sleep() line commented out it works OK without leaks. Run time = 1hr approx.

If the code with Sleep line commented out is calling Pthreads-win32 2.9.1 library (prebuilt for MSVC) then the program stops spawning new threads and stops responding after 5..10 minutes.

Test environment: XP Home, MSVC 2010 Expresss, Qt5.2.0 qmake etc.

回答5:

You forgot to join your thread (even if they are finished already). Correct code should be:

        pthread_t arr[100];
        for(int idx=0;idx<100;idx++)
        {
            pthread_attr_init(&attr);
            pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
            pthread_create( &arr[idx], &attr, &threadFunc, NULL);
            pthread_attr_destroy ( &attr );
        }

        Sleep(2000);

        for(int idx=0;idx<100;idx++)
        {
            pthread_join(arr[idx]);
        }

Note from man page:

   Failure to join with a thread that is joinable (i.e., one that is not detached), produces a "zombie thread".  Avoid doing this, since each zombie thread consumes some system resources, and  when  enough  zombie  threads  have
   accumulated, it will no longer be possible to create new threads (or processes).