Forcing a thread to block all other threads from e

UPDATE:

This answer states that what I'm trying to do is impossible as of April 2013. This, however, seems to contradict what Alex Martelli says in Python Cookbook (p. 624, 3rd ed.):

Upon return, PyGILState_Ensure() always guarantees that the calling thread has exclusive access to the Python interpreter. This is true even if the calling C code is running a different thread that is unknown to the interpreter.

The docs also seem to suggest GIL can be acquired, which would give me hope (except I don't think I can call PyGILState_Ensure() from pure python code, and if I create a C extension to call it, I'm not sure how to embed my memory_daemon() in that).

Perhaps I'm misreading either the answer or Python Cookbook and the docs.

ORIGINAL QUESTION:

I want a given thread (from threading module) to prevent any other thread from running while a certain segment of its code is executing. What's the easiest way to achieve it?

Obviously, it would be great to minimize code changes in the other threads, to avoid using C and direct OS calls, and to make it cross-platform for windows and linux. But realistically, I'll be happy to just have any solution whatsoever for my actual environment (see below).

Environment:

CPython
python 3.4 (but can upgrade to 3.5 if it helps)
Ubuntu 14.04

Use case:

For debugging purposes, I calculate memory used by all the objects (as reported by gc.get_objects()), and print some summary report to sys.stderr. I do this in a separate thread, because I want this summary delivered asynchronously from other threads; I put time.sleep(10) at the end of the while True loop that does the actual memory usage calculation. However, the memory reporting thread takes a while to complete each report, and I don't want all the other threads to move ahead before the memory calculation is finished (otherwise, the memory snapshot will be really hard to interpret).

Example (to clarify the question):

import threading as th
import time

def report_memory_consumption():
  # go through `gc.get_objects()`, check their size and print a summary
  # takes ~5 min to run

def memory_daemon():
  while True:
    # all other threads should not do anything until this call is complete
    report_memory_consumption()
    # sleep for 10 sec, then update memory summary
    # this sleep is the only time when other threads should be executed
    time.sleep(10)


def f1():
  # do something, including calling many other functions
  # takes ~3 min to run

def f2():
  # do something, including calling many other functions
  # takes ~3 min to run


def main():
  t_mem = th.Thread(target = memory_daemon)
  t1 = th.Thread(target = f1)
  t2 = th.Thread(target = f2)
  t_mem.start()
  t1.start()
  t2.start()

# requirement: no other thread is running while t_mem is not sleeping

标签： python multithreading python-3.x python-multithreading

4条回答

Animai°情兽

2楼-- · 2019-06-16 06:25

As a stop-gap solution (for obvious reasons), the following worked for me:

def report_memory_consumption():
  sys.setswitchinterval(1000) # longer than the expected run time
  # go through `gc.get_objects()`, check their size and print a summary
  # takes ~5 min to run
  sys.setswitchinterval(0.005) # the default value

If anyone has a better answer, please post it.

0人赞添加讨论(0) 举报

我想做一个坏孩纸

3楼-- · 2019-06-16 06:27

You should use threading locks to execute code synchronously between threads. The answer given is somewhat correct but I would use reentrant locals to check again to see if you indeed have the lock.

Do not use variables as described in another answer to check for lock possession. The variables can get corrupted between multiple threads. Reentrant locks were meant to solve this problem.

Also what's incorrect in that code is that lock is released assuming the code between doesn't throw exception. so always do in with context or try-catch-finally.

Here is an excellent article explaining synchronization in Python and threading docs.

Edit: Answering OP's update on embedding Python in C

You misunderstood what he said in the cookbook. PyGILState_Ensure returns the GIL if a GIL is available in the current python interpreter but not C threads which is unknown to the python interpreter.

You can't force to get GIL from other threads in the current interpreter. Imagine if you were able to, then basically you will cannibalize all other threads.

0人赞添加讨论(0) 举报

甜甜的少女心

4楼-- · 2019-06-16 06:32

The Python Cookbook is correct. You have exclusive access to the Python interpreter at the point when PyGILState_Ensure() returns. Exclusive access means that you can safely call all CPython functions. And it means the current C thread is also the current active Python thread. If the current C thread did not have a corresponding Python thread before, PyGILState_Ensure() will have created one for you automatically.

That is the state right after PyGILState_Ensure(). And you also have the GIL acquired at that point.

However, when you call other CPython functions now, such as PyEval_EvalCode() or any other, they can implicitly make that the GIL gets released meanwhile. For example, that is the case if implicitly the Python statement time.sleep(0.1) gets called somewhere as a result. And while the GIL is released from this thread, other Python threads can run.

You only have the guarantee that when PyEval_EvalCode() (or whatever other CPython function you called) returns, you will again have the same state as before - i.e. you are on the same active Python thread and you again have the GIL.

About your original question: There currently is no way to achieve this, i.e. to call Python code and avoid that the GIL gets released as a result somewhere meanwhile. And this is a good thing, otherwise you could easily be end up in deadlocks, e.g. if you don't allow some other thread to release some lock which it currently holds.

About how to implement your use case: The only real way to do that is in C. You would call PyGILState_Ensure() to get the GIL. And at that point, you must only call those CPython functions which cannot have the side effect of calling other Python code. Be very careful. Even PyObj_DecRef() could call __del__. The best thing would be to avoid calling any CPython functions and manually traversing the CPython objects. Note that you probably don't have to do it as complicated as you outlined it: There is the underlying CPython memory allocator and I think you can just get the information from there.

Read here about the memory management in CPython.

Related code is in pymem.h, obmalloc.c and pyarena.c. See the function _PyObject_DebugMallocStats(), although that might not be compiled into your CPython.

There is also the tracemalloc module which however will add some overhead. Maybe its underlying C code (file _tracemalloc.c) is helpful however to understand the internals a bit better.

About sys.setswitchinterval(1000): That is related only for going through the Python bytecode and handling it. That is basically the main loop of CPython in PyEval_EvalFrameEx in the file ceval.c. There you'll find such part:

if (_Py_atomic_load_relaxed(&gil_drop_request))
    ...

All the logic with the switch interval is covered in the file ceval_gil.h.

Setting a high switch interval just means that the main loop in PyEval_EvalFrameEx will not be interrupted for a longer time. That does not mean that there aren't other possibilities that the GIL could get released meanwhile and that another thread could run.

PyEval_EvalFrameEx will execute the Python bytecode. Let's assume that this calls time.sleep(1). That will call the native C implementation of the function. You'll find that in time_sleep() in the file timemodule.c. If you follow that code, you'll find this:

Py_BEGIN_ALLOW_THREADS
err = select(0, (fd_set *)0, (fd_set *)0, (fd_set *)0, &timeout);
Py_END_ALLOW_THREADS

Thus, the GIL gets released meanwhile. Now, any other thread which is waiting for the GIL could pick it up and run other Python code.

Theoretically, you could think, if you set a high switch interval and never call any Python code which in turn could release the GIL at some point, you would be safe. Note that this is almost impossible, though. E.g. the GC will get called from time to time and any __del__ of some objects could have various side effects.

0人赞添加讨论(0) 举报

看我几分像从前

5楼-- · 2019-06-16 06:34

Python is always executing one thread at a time because of the Global Interpreter Lock. It doesn't do so when multiprocessing is involved. You can see this answer to learn more about the GIL in CPython.

Note, that's pseudocode as I don't know how you're creating threads/using them/which code you're executing in threads.

import threading, time

l=threading.Lock()
locked=False

def worker():
    l.acquire()
    locked=True
    #do something
    l.release()

def test():
    while locked:
        time.sleep(10)
    #do something

threads = []
t = threading.Thread(target=worker)
threads.append(t)
t = threading.Thread(target=test)
threads.append(t)
for th in threads:
    th.start()
for th in threads:
    th.join()

Certainly, it may be written better and can be optimized.

0人赞添加讨论(0) 举报

Forcing a thread to block all other threads from e

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间