Python C-Api Threading issues

I am writing a C program which uses a networking lib written in python. I embed the python lib with the python C api. The library sends all requests async and informs me through signals when the request is done.

That means in theory.

In reality I have two threading related problems problems:

All calls to the python lib from c are blocky (they should return immediately)
The python lib calls the registered callbacks async (thread.start_new_thread(callback, args)). This does not work (nothing happens). If I change the python code to callback(args) then it does work.

What I am doing wrong? Is there something I have to do to make multithreading work?

I have similar scenario.

Initial work flow

Application starts from C++ layer
C++ layer invokes function in Python layer in main thread
The Python layer function in main thread creates an event thread
Starts the event thread in Python layer and go back to C++ layer
Main loop starts in C++ layer
The event thread invokes callback function in C++ layer if needed

From the beginning, the event thread works unexpected. I guess this is due to GIL from the situation I encountered so I tried to solve this from GIL. Here is my solution.

Analysis

First, from note in PyEval_InitThreads,

When only the main thread exists, no GIL operations are needed. ... Therefore, the lock is not created initially. ...

So if multi-thread is needed, PyEval_InitThreads() must be called in main thread. And I call PyEval_InitThreads() before Py_Initialize(). Now GIL is initialized and main thread acquires GIL.

Second, each time before Python function is invoked from C++ layer, PyGILState_Ensure() is called to get GIL. In addition, after Python function is invoked, PyGILState_Release(state) is called to go back to previous GIL state. As a result, before step 2, PyGILState_Ensure() is called, and after step 4, PyGILState_Release(state) is called.

But there is a problem. From PyGILState_Ensure and PyGILState_Release, these two functions are to save current GIL state to get GIL and restore previous GIL state to release GIL. However, after calling PyEval_InitThreads() in main thread, main thread owns GIL definitely. And the GIL state in main thread is as follows:

/* main thread owns GIL by PyEval_InitThreads */

state = PyGILState_Ensure();
/* main thread owns GIL by PyGILState_Ensure */

...
/* invoke Python function */
...

PyGILState_Release(state);
/* main thread owns GIL due to go back to previous state */

From above code sample, main thread always owns GIL so the event thread never runs. To overcome this situation, let main thread not acquire GIL before calling PyGILState_Ensure(). Therefore, after calling PyGILState_Release(state), main thread could release GIL to let event thread run. So GIL should be released in main thread immediately when GIL is initialized.

Here PyEval_SaveThread() is used. From PyEval_SaveThread,