I have developed a simple multi-threaded Python application (Python 3.7) to call 8 different compute-intensive tasks within 8 threads . The tasks can be either a Python code or a C++ code that is embedded inside a DLL and is accessible through ctypes package. I am running the experiments on a 8-core machine on Windows.
The strange point is that when all the threads call the Python code, it seems that only one thread is active at an specific time and the CPU utilization is around 12.5%. But when calling the C++ code inside the DLL, the whole cores are used and the CPU utilization is 100%.
Now, the question is that why GIL (Global Interpreter Lock) is not synchronizing Python threads that call native C++ codes? Is the ctypes implementation releases the GIL when calling the native C++ code?
Edit 1: No macro like Py_BEGIN_ALLOW_THREADS is used inside the Native C++ DLL.
From [Python 3]: ctypes - Loading shared libraries (emphasis is mine; thanks @user2357112 for pointing out this very explicit quote (waay better than what I've originally posted)):
You can also find this statement in other forms on the same page (check PyDLL, CFUNCTYPE).
There are ways of going around the GIL limitation:
Replacing the threading module usage by multiprocessing ([Python 3]: multiprocessing - Process-based parallelism). This is the most common one
Enclosing code blocks that can be executed in parallel in Py_BEGIN_ALLOW_THREADS / Py_END_ALLOW_THREADS. The drawback would be that the .dll(s) will now depend on Python