I have developed a simple multi-threaded Python application (Python 3.7) to call 8 different compute-intensive tasks within 8 threads . The tasks can be either a Python code or a C++ code that is embedded inside a DLL and is accessible through ctypes package. I am running the experiments on a 8-core machine on Windows.
The strange point is that when all the threads call the Python code, it seems that only one thread is active at an specific time and the CPU utilization is around 12.5%. But when calling the C++ code inside the DLL, the whole cores are used and the CPU utilization is 100%.
Now, the question is that why GIL (Global Interpreter Lock) is not synchronizing Python threads that call native C++ codes? Is the ctypes implementation releases the GIL when calling the native C++ code?
Edit 1: No macro like Py_BEGIN_ALLOW_THREADS is used inside the Native C++ DLL.