Does the CLR perform “lock elision” optimization?

2019-03-19 16:19发布

问题:

The JVM performs a neat trick called lock elision to avoid the cost of locking on objects that are only visible to one thread.

There's a good description of the trick here:

http://www.ibm.com/developerworks/java/library/j-jtp10185/

Does the .Net CLR do something similar? If not then why not?

回答1:

It's neat, but is it useful? I have a hard time coming up with an example where the compiler can prove that a lock is thread local. Almost all classes don't use locking by default, and when you choose one that locks, then in most cases it will be referenced from some kind of static variable foiling the compiler optimization anyways.

Another thing is that the java vm uses escape analysis in its proof. And AFAIK .net hasn't implemented escape analysis. Other uses of escape analysis such as replacing heap allocations with stack allocations sound much more useful and should be implemented first.

IMO it's currently not worth the coding effort. There are many areas in the .net VM which are not optimized very well and have much bigger impact.

SSE vector instructions and delegate inlining are two examples from which my code would profit much more than from this optimization.



回答2:

EDIT: As chibacity points out below, this is talking about making locks really cheap rather than completely eliminating them. I don't believe the JIT has the concept of "thread-local objects" although I could be mistaken... and even if it doesn't now, it might in the future of course.

EDIT: Okay, the below explanation is over-simplified, but has at least some basis in reality :) See Joe Duffy's blog post for some rather more detailed information.


I can't remember where I read this - probably "CLR via C#" or "Concurrent Programming on Windows" - but I believe that the CLR allocates sync blocks to objects lazily, only when required. When an object whose monitor has never been contested is locked, the object header is atomically updated with a compare-exchange operation to say "I'm locked". If a different thread then tries to acquire the lock, the CLR will be able to determine that it's already locked, and basically upgrade that lock to a "full" one, allocating it a sync block.

When an object has a "full" lock, locking operations are more expensive than locking and unlocking an otherwise-uncontested object.

If I'm right about this (and it's a pretty hazy memory) it should be feasible to lock and unlock a monitor on different threads cheaply, so long as the locks never overlap (i.e. there's no contention).

I'll see if I can dig up some evidence for this...



回答3:

In answer to your question: No, the CLR\JIT does not perform "lock elision" optimization i.e. the CLR\JIT does not remove locks from code which is only visible to single threads. This can easily be confirmed with simple single threaded benchmarks on code where lock elision should apply as you would expect in Java.

There are likely to be a number of reasons why it does not do this, but chiefly is the fact that in the .Net framework this is likely to be an uncommonly applied optimization, so is not worth the effort of implementing.

Also in .Net uncontended locks are extremely fast due to the fact they are non-blocking and executed in user space (JVMs appear to have similar optimizations for uncontended locks e.g. IBM). To quote from C# 3.0 In A Nutshell's threading chapter

Locking is fast: you can expect to acquire and release a lock in less than 100 nanoseconds on a 3 GHz computer if the lock is uncontended"

A couple of example scenarios where lock elision could be applied, and why it's not worth it:

Using locks within a method in your own code that acts purely on locals

There is not really a good reason to use locking in this scenario in the first place, so unlike optimizations such as hoisting loop invariants or method inling, this is a pretty uncommon case and the result of unnecessary use of locks. The runtime shouldn't be concerned with optimizing out uncommon and extreme bad usage.

Using someone else's type that is declared as a local which uses locks internally

Although this sounds more useful, the .Net framework's general design philosophy is to leave the responsibility for locking to clients, so it's rare that types have any internal lock usage. Indeed, the .Net framework is pathologically unsynchronized when it comes to instance methods on types that are not specifically designed and advertized to be concurrent. On the other hand, Java has common types that do include synchronization e.g. StringBuffer and Vector. As the .Net BCL is largely unsynchronized, lock elision is likely to have little effect.

Summary

I think overall, there are fewer cases in .Net where lock elision would kick in, because there are simply not as many places where there would be thread-local locks. Locks are far more likely to be used in places which are visible to multiple threads and therefore should not be elided. Also, uncontended locking is extremely quick.

I had difficulty finding any real-world evidence that lock elision actually provides that much of a performance benefit in Java (for example...), and the latest docs for at least the Oracle JVM state that elision is not always applied for thread local locks, hinting that it is not an always given optimization anyway.

I suspect that lock elision is something that is made possible through the introduction of escape analysis in the JVM, but is not as important for performance as EA's ability to analyze whether reference types can be allocated on the stack.