Current state of drd and helgrind support for std:

2019-02-12 10:40发布

问题:

As I transition my code to C++11, I would very much like to convert my pthread code to std::thread. However, I seem to be getting false race conditions on very simple programs in drd and in helgrind.

#include <thread>

int main(int argc, char** argv)
{
    std::thread t( []() { } );
    t.join();
    return 0;
}

Helgrind output snippet - I also get similar errors in drd, using gcc 4.6.1, valgrind 3.7.0 on Ubuntu 11.11 amd64.

My questions are:

  • sanity check: Am I doing anything wrong? Are others getting similar false reports on simple std::thread programs?
  • What are current users of std::thread using to detect race-conditions?

I am reluctant to port a ton of code from pthread to std::thread until some crucial tools like helgrind/drd have caught up.

==19347== ---Thread-Announcement------------------------------------------
==19347== 
==19347== Thread #1 is the program's root thread
==19347== 
==19347== ---Thread-Announcement------------------------------------------
==19347== 
==19347== Thread #2 was created
==19347==    at 0x564C85E: clone (clone.S:77)
==19347==    by 0x4E37E7F: do_clone.constprop.3 (createthread.c:75)
==19347==    by 0x4E39604: pthread_create@@GLIBC_2.2.5 (createthread.c:256)
==19347==    by 0x4C2B3DA: pthread_create_WRK (hg_intercepts.c:255)
==19347==    by 0x4C2B55E: pthread_create@* (hg_intercepts.c:286)
==19347==    by 0x50BED02: std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
==19347==    by 0x400D51: _ZNSt6threadC1IZ4mainEUlvE_IEEEOT_DpOT0_ (in /mnt/home/kfeng/dev/robolab/cpp/sbx/sandbox)
==19347==    by 0x400C60: main (in /mnt/home/kfeng/dev/robolab/cpp/sbx/sandbox)
==19347== 
==19347== ----------------------------------------------------------------
==19347== 
==19347== Possible data race during write of size 8 at 0x5B8E060 by thread #1
==19347== Locks held: none
==19347==    at 0x40165E: _ZNSt6thread5_ImplISt12_Bind_resultIvFZ4mainEUlvE_vEEED1Ev (in /mnt/home/kfeng/dev/robolab/cpp/sbx/sandbox)
==19347==    by 0x401895: _ZNKSt19_Sp_destroy_inplaceINSt6thread5_ImplISt12_Bind_resultIvFZ4mainEUlvE_vEEEEEclEPS6_ (in /mnt/home/kfeng/dev/robolab/cpp/sbx/sandbox)
==19347==    by 0x4016D8: _ZNSt19_Sp_counted_deleterIPNSt6thread5_ImplISt12_Bind_resultIvFZ4mainEUlvE_vEEEESt19_Sp_destroy_inplaceIS6_ESaIS6_ELN9__gnu_cxx12_Lock_policyE2EE10_M_disposeEv (in /mnt/home/kfeng/dev/robolab/cpp/sbx/sandbox)
==19347==    by 0x401B83: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (in /mnt/home/kfeng/dev/robolab/cpp/sbx/sandbox)
==19347==    by 0x401B3E: std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() (in /mnt/home/kfeng/dev/robolab/cpp/sbx/sandbox)
==19347==    by 0x401A93: std::__shared_ptr<std::thread::_Impl_base, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr() (in /mnt/home/kfeng/dev/robolab/cpp/sbx/sandbox)
==19347==    by 0x401AAD: std::shared_ptr<std::thread::_Impl_base>::~shared_ptr() (in /mnt/home/kfeng/dev/robolab/cpp/sbx/sandbox)
==19347==    by 0x400D5D: _ZNSt6threadC1IZ4mainEUlvE_IEEEOT_DpOT0_ (in /mnt/home/kfeng/dev/robolab/cpp/sbx/sandbox)
==19347==    by 0x400C60: main (in /mnt/home/kfeng/dev/robolab/cpp/sbx/sandbox)
==19347== 
==19347== This conflicts with a previous read of size 8 by thread #2
==19347== Locks held: none
==19347==    at 0x50BEABE: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
==19347==    by 0x4C2B547: mythread_wrapper (hg_intercepts.c:219)
==19347==    by 0x4E38EFB: start_thread (pthread_create.c:304)
==19347==    by 0x564C89C: clone (clone.S:112)
==19347== 
==19347== Address 0x5B8E060 is 32 bytes inside a block of size 64 alloc'd
==19347==    at 0x4C29059: operator new(unsigned long) (vg_replace_malloc.c:287)
==19347==    by 0x4012E9: _ZN9__gnu_cxx13new_allocatorISt23_Sp_counted_ptr_inplaceINSt6thread5_ImplISt12_Bind_resultIvFZ4mainEUlvE_vEEEESaIS8_ELNS_12_Lock_policyE2EEE8allocateEmPKv (in /mnt/home/kfeng/dev/robolab/cpp/sbx/sandbox)
==19347==    by 0x40117C: _ZNSt14__shared_countILN9__gnu_cxx12_Lock_policyE2EEC1INSt6thread5_ImplISt12_Bind_resultIvFZ4mainEUlvE_vEEEESaISA_EIS9_EEESt19_Sp_make_shared_tagPT_RKT0_DpOT1_ (in /mnt/home/kfeng/dev/robolab/cpp/sbx/sandbox)
==19347==    by 0x4010B9: _ZNSt12__shared_ptrINSt6thread5_ImplISt12_Bind_resultIvFZ4mainEUlvE_vEEEELN9__gnu_cxx12_Lock_policyE2EEC1ISaIS6_EIS5_EEESt19_Sp_make_shared_tagRKT_DpOT0_ (in /mnt/home/kfeng/dev/robolab/cpp/sbx/sandbox)
==19347==    by 0x401063: _ZNSt10shared_ptrINSt6thread5_ImplISt12_Bind_resultIvFZ4mainEUlvE_vEEEEEC1ISaIS6_EIS5_EEESt19_Sp_make_shared_tagRKT_DpOT0_ (in /mnt/home/kfeng/dev/robolab/cpp/sbx/sandbox)
==19347==    by 0x401009: _ZSt15allocate_sharedINSt6thread5_ImplISt12_Bind_resultIvFZ4mainEUlvE_vEEEESaIS6_EIS5_EESt10shared_ptrIT_ERKT0_DpOT1_ (in /mnt/home/kfeng/dev/robolab/cpp/sbx/sandbox)
==19347==    by 0x400EF7: _ZSt11make_sharedINSt6thread5_ImplISt12_Bind_resultIvFZ4mainEUlvE_vEEEEIS5_EESt10shared_ptrIT_EDpOT0_ (in /mnt/home/kfeng/dev/robolab/cpp/sbx/sandbox)
==19347==    by 0x400E17: _ZNSt6thread15_M_make_routineISt12_Bind_resultIvFZ4mainEUlvE_vEEEESt10shared_ptrINS_5_ImplIT_EEEOS7_ (in /mnt/home/kfeng/dev/robolab/cpp/sbx/sandbox)
==19347==    by 0x400D2B: _ZNSt6threadC1IZ4mainEUlvE_IEEEOT_DpOT0_ (in /mnt/home/kfeng/dev/robolab/cpp/sbx/sandbox)
==19347==    by 0x400C60: main (in /mnt/home/kfeng/dev/robolab/cpp/sbx/sandbox)
==19347== 
==19347== ----------------------------------------------------------------
==19347==

回答1:

std::thread uses a shared pointer internally. What you are seeing are false positives on the reference count of that shared pointer object. You can avoid these false positives by adding the four lines of code shown below in each source file just before the C++ header include directives. Note: this only works with the version of libstdc++ included with gcc 4.6.0 or later.

#include <valgrind/drd.h>
#define _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(addr) ANNOTATE_HAPPENS_BEFORE(addr)
#define _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(addr) ANNOTATE_HAPPENS_AFTER(addr)
#define _GLIBCXX_EXTERN_TEMPLATE -1

For more information, see also the Data Race Hunting section in the libstdc++ manual (http://gcc.gnu.org/onlinedocs/libstdc++/manual/debug.html).



回答2:

Most likely what you are seeing are false positives. I am observing a similar behaviour in my code.

Specifically, the warnings seem related to the implementation of the shared pointer class, and my understanding is that on your platform (which I presume is x86/x86-64?) GCC is using optimized atomic assembly instruction in the shared pointer reference counting machinery. The problem is that valgrind is able to detect errors when using the POSIX primitives (locks, mutexes, etx.), but it is not able to cope with lower level primitives.

What I've done so far is to simply filter out from the output of valgrind the warnings (possibly you could write some suppression file that does the job in the proper way).



回答3:

If you use boost you can turn on the use of pthreads primitives instead of atomic operations for shared pointers. You can then use a version of your code compiled with BOOST_SP_USE_PTHREADS for helgrind analysis, and you will not get the errors because helgrind understands the pthreads primitives.

See http://www.boost.org/doc/libs/1_54_0/libs/smart_ptr/shared_ptr.htm#ThreadSafety for more info.