I'm writing a fast multi-thread program, and I want to avoid syncronization (the function which would need to be syncronized must be called something like 5,000,000 times per second, so even a mutex would be too heavy).
The scenario is: I have a single global instance of a class, and each thread can access it. In order to avoid syncronization, all the data inside the class is accessed read-only, except for a bunch of class members, which are then declared in TLS (with __thread or __declspec(thread)).
Unfortunately, in order to use the __thread interface offered by the compiler, the class members have to be static and without constructors/deconstructors. The classes I use of course have custom constructors, so I'm declaring, as class members, a pointer to that classes (something like static __thread MyClass* _object).
Then, the first time a thread calls a method from the global instance, I'll do something like "(if _object == NULL) object = new MyClass(...)".
My biggest problem is: is there a smart way to free this allocated memory? This global class is from a library, and it is used by many threads in the program, and each thread is created in a different way (i.e. each thread executes a different function) and I can't put a snipplet of code each time the thread is going to terminate. Thank you guys.
TLS clean-up is usually done in DllMain when it is passed DLL_THREAD_DETACH.
If your code is all in an EXE and not a DLL then you could create a dummy DLL that the EXE loads which in turn calls back into the EXE on DLL_THREAD_DETACH. (I don't know of a better way to have EXE code run on thread termination.)
There are a couple of ways for the DLL to call back into the EXE: One is that EXEs can export functions just like DLLs, and the DLL code can use GetProcAddress on the EXE's module handle. An easier method is to give the DLL an init function which the EXE calls to explicitly pass a function pointer.
Note that what you can do within DllMain is limited (and unfortunately the limits are not properly documented), so you should minimize any work done this way. Don't run any complex destructors; just free memory using a direct kernel32.dll API like HeapAlloc and free the TLS slot.
Also note that you won't get a DLL_THREAD_ATTACH for threads that were already running when your DLL was loaded (but you will still get DLL_THREAD_DETACH if they exit while the DLL is loaded), and that you'll get (only) a DLL_PROCESS_DETACH when the final thread exits.
Boost Thread Local Storage
If you just want a generic cleanup function you can still use boost thread_specific_ptr. You don't need to actually use the data stored there, but you can take advantage of the custom cleanup function. Just make that function something arbitrary and you can do whatever you want. Look at the pthread function
pthread_key_create
for a direct pthreads function call.There is unfortunately no easy answer, at least not that I've come across yet. That is, there is no common way to have complex objects deleted at thread exit time. However, there's nothing stopping you from doing this on your own.
You will need to register your own handlers at thread exit time. With pthreads that would be
pthread_cleanup_push
. I don't know what it is on windows. This is of course not cross-platform. But, presumably you have full control of the starting of the thread and its entry-point. You could simply explicitly call a cleanup function just before returning from your thread. I know you mentioned you can't add this snippet, in which case you'll be left calling the OS specific function to add a cleanup routine.Obviously creating cleanup functions for all objects allocated could be annoying. So instead you should create one more thread local variable: a list of destructors for objects. For each thread-specific variable you create you'll push a destructor onto this list. This list will have to be created on demand if you don't have a common thread entry point: have a global function to call which takes your destructor and creates the list as necessary, then adds the destructor.
Exactly what this destructor looks like depends heavily on your object hierarchy (you may have simple boost bind statements, shared_ptr's, a virtual destructor in a base class, or a combination thereof).
Your generic cleanup function can then walk through this list and perform all the destructors.
In C++11 this is easily achieved:
cleanup_tls()
will execute on every thread termination (provided the thread is created using C++ API likestd::thread
).But then, you could just as well cleanup TLS objects directly in their destructors (which will also promptly execute). For example:
static thread_local std::unique_ptr<MyClass> pMyClass;
will deleteMyClass
when a thread terminates.Before C++11 you can use hacks like the GNU "linker sets" or MSVC "_tls_used" callback.
Or, starting from Windows 6 (Vista), FlsAlloc, which accepts a cleanup callback.
If you are using pthreads you could look at the cleanup operations?
http://man.yolinux.com/cgi-bin/man2html?cgi_command=pthread_cleanup_push
You can push a cleanup operation just after the creation of the thread local object, such that on exit that object gets destroyed. Not sure what the winapi equivalent is...