How do I monitor how busy a Python event loop is?

2020-02-13 13:03发布

问题:

I have an asynchronous application that is serving requests via aiohttp and doing other async tasks (interacting with a database, handling messages, making requests itself as an HTTP client). I'd like to monitor how busy the event loop is, perhaps how much time it's spending executing code versus waiting for a select to complete.

Is there a way to measure this with standard library event loops or other third-party loops (uvloop)?

Specifically, I'd like an ongoing percentage measure of saturation, not just a binary "is it busy" that this question seems to address.

回答1:

Digging source code shows following:

  1. Event loop is basically executing _run_once in while True loop
  2. _run_once does all stuff including waiting for a select to complete
  3. timeout to wait for select isn't stored anywhere outside _run_once

Nothing stops us to reimplement _run_once to time we need.

Instead of coping full _run_once implementation, we can assume time right before select is when _run_once started (because above select nothing time consuming happens) and time right after select is when _process_events started.

From words to action:

import asyncio


class MeasuredEventLoop(asyncio.SelectorEventLoop):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self._total_time = 0
        self._select_time = 0

        self._before_select = None

    # TOTAL TIME:
    def run_forever(self):
        started = self.time()
        try:
            super().run_forever()
        finally:
            finished = self.time()
            self._total_time = finished - started

    # SELECT TIME:
    def _run_once(self):
        self._before_select = self.time()
        super()._run_once()

    def _process_events(self, *args, **kwargs):
        after_select = self.time()
        self._select_time += after_select - self._before_select
        super()._process_events(*args, **kwargs)

    # REPORT:
    def close(self, *args, **kwargs):
        super().close(*args, **kwargs)

        select = self._select_time
        cpu = self._total_time - self._select_time
        total = self._total_time

        print(f'Waited for select: {select:.{3}f}')
        print(f'Did other stuff: {cpu:.{3}f}')
        print(f'Total time: {total:.{3}f}')

Let's test it:

import time


async def main():
    await asyncio.sleep(1)  # simulate I/O, will be handled by selectors
    time.sleep(0.01)        # CPU job, executed here, outside event loop
    await asyncio.sleep(1)
    time.sleep(0.01)


loop = MeasuredEventLoop()
asyncio.set_event_loop(loop)
try:
    loop.run_until_complete(main())
finally:
    loop.close()

Result:

Waited for select: 2.000
Did other stuff: 0.032
Total time: 2.032

Let's test it against code with real I/O:

import aiohttp


async def io_operation(delay):
    async with aiohttp.ClientSession() as session:
        async with session.get(f'http://httpbin.org/delay/{delay}') as resp:
            await resp.text()


async def main():
    await asyncio.gather(*[
        io_operation(delay=1),
        io_operation(delay=2),
        io_operation(delay=3),
    ])

Result:

Waited for select: 3.250
Did other stuff: 0.016
Total time: 3.266


回答2:

One way of measuring how much time the CPU was busy (applicable for all sorts of IO-bound situations, not just async) is to periodically check time.process_time().

For example, here's how I would implement something which occasionally prints the percentage of time that the loop was "busy" because the process was running code on the CPU:

import asyncio
import time

async def monitor():
    while True:
        before = time.process_time()
        await asyncio.sleep(10)
        after = time.process_time()
        print('{:.2%}'.format((after - before) / 10))


async def busy_wait():
    while True:
        await asyncio.sleep(.001)

loop = asyncio.get_event_loop()
loop.create_task(monitor())
loop.create_task(busy_wait())
loop.run_forever()

The monitor coroutine measures the amount of process time elapsed every 10 seconds, and prints it as a percentage of the full 10 seconds.

The busy_wait coroutine generates artificial load on the CPU by repeatedly sleeping for short periods. By increasing the length of the sleep you can make the process time ratio decrease arbitrarily, and by decreasing the length of the sleep to 0 you can make the process time approach 100% of the wall-clock time.

One caveat: this will only tell you how busy the python event loop is in terms of time spent "running python code" (in the CPU-bound sense). If someone is blocking the event loop with calls to time.sleep() (rather than asyncio.sleep()), my method will show the loop as being free but it is really "busy" in the sense that it is blocked by a system-level sleep. Most correctly written async code shouldn't call time.sleep() or do blocking IO, but if it does, it will not be reflected correctly by this type of monitoring.