Bad lock optimization in asyncio

2019-05-16 23:09发布

问题:

Update: Edited title to focus on the main problem. See my answer for full update.

In the following code, a() and b() are identical. Each of them counts from 0 to 9 concurrently while acquiring and yielding a lock every 2 counts.

import asyncio

lock = asyncio.Lock()

def a ():
 yield from lock.acquire()
 for i in range(10):
  print('a: ' + str(i))
  if i % 2 == 0:
   lock.release()
   yield from lock.acquire()
 lock.release()

def b ():
 yield from lock.acquire()
 for i in range(10):
  print('b: ' + str(i))
  if i % 2 == 0:
   lock.release()
   yield from lock.acquire()
 lock.release()

asyncio.get_event_loop().run_until_complete(asyncio.gather(a(), b()))

print('done')

I expected interleaved output, but instead I get:

b: 0
b: 1
b: 2
b: 3
b: 4
b: 5
b: 6
b: 7
b: 8
b: 9
a: 0
a: 1
a: 2
a: 3
a: 4
a: 5
a: 6
a: 7
a: 8
a: 9
done

It seems that the second yield does not actually yield but instead reacquires the lock immediately and continues.

That seems like a bug to me. Am I right? or is there another explanation?

The following code, modified with an extra initial "noop" yield, works fine as expected. That makes me believe that the lock is indeed fair and probably correct.

import asyncio

lock = asyncio.Lock()

def a ():
 yield from lock.acquire()
 yield from asyncio.sleep(0)
 for i in range(10):
  print('a: ' + str(i))
  if i % 2 == 0:
   lock.release()
   yield from lock.acquire()
 lock.release()

def b ():
 yield from lock.acquire()
 yield from asyncio.sleep(0)
 for i in range(10):
  print('b: ' + str(i))
  if i % 2 == 0:
   lock.release()
   yield from lock.acquire()
 lock.release()

asyncio.get_event_loop().run_until_complete(asyncio.gather(a(), b()))

print('done')

Output:

a: 0
b: 0
a: 1
a: 2
b: 1
b: 2
a: 3
a: 4
b: 3
b: 4
a: 5
a: 6
b: 5
b: 6
a: 7
a: 8
b: 7
b: 8
a: 9
b: 9
done

Note that I am doing the no-op yield just once at the beginning, and not every 2 counts. Yet doing so causes the interleaving every 2 counts as expected in the first piece of code.

There is just some optimization (a bug in my opinion) in the scheduler that doesn't really yield when acquiring a lock that nobody else is waiting for.

How else to explain the first output?

回答1:

Update: The following is obsolete in light of my comment (link) on the github issue. The comment observes that you can use Lock.locked() to predict whether Lock.acquire() will yield or not. It also observes that numerous other coroutines do not yield in the fast case, so it is a lost cause to even consider fixing all of them. Finally it relates this how a different issue was fixed and suggests that it could have been better fixed. That was the request for a asyncio.nop() method that would merely yield to the scheduler and do nothing else. Instead of adding that method, they decided to overload asyncio.sleep(0) and "de-optimize" it (in the context of this dicussion of lock.acquire()) to yield to the scheduler when the argument is 0.

Original answer below but superseded by above paragraph:

The root cause that the implementation of asyncio.lock tries to be too smart in its first three lines and does not give control back to the scheduler if there are no waiters:

if not self._locked and all(w.cancelled() for w in self._waiters):
    self._locked = True
    return True

However, as my first example shows, this prevents other coroutines from even becoming a waiter. They just don't have an opportunity to run to the point they try to acquire a lock.

The inefficent workaround is to always yield from asyncio.sleep(0) immediately before acquiring a lock.

This is inefficient because in the common case there will be other waiters and acquiring a lock will also yield control back to the scheduler. Therefore in most cases you will be yielding control back to the scheduler twice, which sucks.

Also note that the documentation for lock ambiguously says: "This method blocks until the lock is unlocked, then sets it to locked and returns True." Certainly gives the impression that it will yield control to the scheduler before acquiring the lock.

In my opinion, the right thing to do is for the lock implementation to always yield and not be too smart. Alternatively, the lock implementation should have a method that tells you whether or not it will yield if acquired so that your code can manually yield if the lock acquisition won't. Another alternative is to have the acquire() call return a value that tells you whether or not it actually yielded. This is less preferable but still better than the status quo.

One might think a better workaround might be to manually yield at the time of release(). However if you look at a tight loop that releases and reacquires after pieces of work, then it amounts to the same thing -- in the common case it will still yield twice, once at release time, again at acquire time, adding inefficiency.



回答2:

It is unclear what you are trying to achieve, but it seems like Lock is not the tool you need. To interleave python code, you can do as simple as:

def foo(tag, n):
    for i in range(n):
        print("inside", tag, i)
        yield (tag, i)


print('start')

for x in zip(foo('A', 10), foo('B', 10)):
    print(x)

print('done')

No need for asyncio or threading. Anyway, asyncio without IO does not make a lot of sense.

threading.Lock is used to synchronize critical parts of a program which otherwise run in independent threads. asyncio.Lock would allow other coroutines to continue with IO while one coroutine waits:

import asyncio
import random

lock = asyncio.Lock()


async def foo(tag):
    print(tag, "Start")
    for i in range(10):
        print(tag, '>', i)
        await asyncio.sleep(random.uniform(0.1, 1))
        print(tag, '<', i)

    async with lock:
        # only one coroutine can execute the critical section at once.
        # other coroutines can still use IO.
        print(tag, "CRITICAL START")
        await asyncio.sleep(1)
        print(tag, "STILL IN CRITICAL")
        await asyncio.sleep(1)
        print(tag, "CRITICAL END")

    for i in range(10, 20):
        print(tag, '>', i)
        await asyncio.sleep(random.uniform(0.1, 1))
        print(tag, '<', i)

    print(tag, "Done")


print('start')

loop = asyncio.get_event_loop()
tasks = asyncio.gather(foo('A'), foo('B'), foo('C'))
loop.run_until_complete(tasks)
loop.close()

print('done')

Keep in mind that the keyword yield does not always obey the English meaning of yield :-) .

You can see that it makes more sense that async with lock will acquire the lock immediately, without awaiting for other coroutines to do more work: the first couroutine to reach the critical part should start running it. (i.e., adding an await asyncio.sleep(0) before async with lock: just does not make any sense.)