可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
If I have a deeply immutable type (all members are readonly and if they are reference type members, then they also refer to objects that are deeply immutable).
I would like to implement a lazy initialized property on the type, like this:
private ReadOnlyCollection<SomeImmutableType> m_PropName = null;
public ReadOnlyCollection<SomeImmutableType> PropName
{
get
{
if(null == m_PropName)
{
ReadOnlyCollection<SomeImmutableType> temp = /* do lazy init */;
m_PropName = temp;
}
return m_PropName;
}
}
From what I can tell:
m_PropName = temp;
...is threadsafe. I'm not worried too much about two threads both racing to initialize at the same time, because it will be rare, both results would be identical from a logical perspective, and I'd rather not use a lock if I don't have to.
Will this work? What are the pros and cons?
Edit:
Thanks for your answers. I will probably move forward with using a lock. However, I'm surprised nobody brought up the possibility of the compiler realizing that the temp variable is unnecessary, and just assigning straight to m_PropName. If that were the case, then a reading thread could possibly read an object that hasn't finished being constructed. Does the compiler prevent such a situation?
(Answers seem to indicate that the runtime won't allow this to happen.)
Edit:
So I've decided to go with an Interlocked CompareExchange method inspired by this article by Joe Duffy.
Basically:
private ReadOnlyCollection<SomeImmutableType> m_PropName = null;
public ReadOnlyCollection<SomeImmutableType> PropName
{
get
{
if(null == m_PropName)
{
ReadOnlyCollection<SomeImmutableType> temp = /* do lazy init */;
System.Threading.Interlocked(ref m_PropName, temp, null);
}
return m_PropName;
}
}
This is supposed to ensure that all threads that call this method on this object instance will get a reference to the same object, so the == operator will work. It is possible to have wasted work, which is fine - it just makes this an optimistic algorithm.
As noted in some comments below, this depends on the .NET 2.0 memory model to work. Otherwise, m_PropName should be declared volatile.
回答1:
That will work. Writing to references in C# is guaranteed to be atomic, as described in section 5.5 of the spec.
This is still probably not a good way to do it, because your code will be more confusing to debug and read in exchange for a probably minor effect on performance.
Jon Skeet has a great page on implementing singeltons in C#.
The general advice about small optimizations like these is not to do them unless a profiler tells you this code is a hotspot. Also, you should be wary of writing code that cannot be fully understood by most programmers without checking the spec.
EDIT: As noted in the comments, even though you say you don't mind if 2 versions of your object get created, that situation is so counter-intuitive that this approach should never be used.
回答2:
You should use a lock. Otherwise you risk two instances of m_PropName
existing and in use by different threads. This may not be a problem in many instances; however, if you want to be able to use ==
instead of .equals()
then this will be a problem. Rare race conditions are not the better bug to have. They are difficult to debug and to reproduce.
In your code, if two different threads simultaneously get your property PropName
(say, on a multi-core CPU), then they can receive different new instances of the property that will contain identical data but not be the same object instance.
One key benefit of immutable objects is that ==
is equivalent to .equals()
, allowing use of the more performant ==
for comparison. If you don't synchronize in the lazy initialization, then you risk losing this benefit.
You also lose immutability. Your object will be initialized twice with different objects (that contain the same values), so a thread that already got the value of your property, but that gets it again, may receive a different object the second time.
回答3:
I'd be interested to hear other answers to this, but I don't see a problem with it. The duplicate copy will be abandoned and gets GCed.
You need to make the field volatile
though.
Regarding this:
However, I'm surprised nobody brought
up the possibility of the compiler
realizing that the temp variable is
unnecessary, and just assigning
straight to m_PropName. If that were
the case, then a reading thread could
possibly read an object that hasn't
finished being constructed. Does the
compiler prevent such a situation?
I considered mentioning it but it makes no difference. The new operator doesn't return a reference (and so the assignment to the field doesn't happen) until the constructor completes - this is guaranteed by the runtime, not the compiler.
However, the language/runtime does NOT really guarantee that other threads cannot see a partially constructed object - it depends what the constructor does.
Update:
The OP also wonders whether this page has a helpful idea. Their final code snippet is an instance of Double checked locking which is the classic example of an idea that thousands of people recommmend to each other without any idea of how to do it right. The problem is that SMP machines consist of several CPUs with their own memory caches. If they had to synchronise their caches every time there was a memory update, this would undo the benefits of having several CPUs. So they only synchronize at a "memory barrier", which occurs when a lock is taken out, or an interlocked operation occurs, or a volatile
variable is accessed.
The usual order of events is:
- Coder discovers double-checked locking
- Coder discovers memory barriers
Between these two events, they release a lot of broken software.
Also, many people believe (as that guy does) that you can "eliminate locking" by using interlocked operations. But at runtime they are a memory barrier and so they cause all CPUs to stop and synchronize their caches. They have an advantage over locks in that they don't need to make a call into the OS kernel (they are "user code" only), but they can kill performance just as much as any synchronization technique.
Summary: threading code looks approximately 1000 x easier to write than it is.
回答4:
I'm all for lazy init when the data may not always be accessed and it can take a good amount of resources to fetch or store the data.
I think there is a key concept being forgotten here: As per the C# design concepts, you should not make your instance members thread-safe by default. Only static members should be made thread-safe by default. Unless you are accessing some static/global data, you should not add extra locks into your code.
From what your code shows, the lazy init is all inside an instance property, so I would not add locks to it. If, by design, it is meant to be accessed by multiple threads simultaneously, then go ahead and add the lock.
By the way, it may not reduce code by much, but I am fan of the null-coalesce operator. The body to your getter could become this instead:
m_PropName = m_PropName ?? new ...();
return m_PropName;
It gets rid of the extra "if (m_PropName == null) ..."
and in my opinion makes it more concise and readable.
回答5:
I am no C# expert, but as far as I can tell, this only poses a problem if you require that only one instance of ReadOnlyCollection is created. You say that the created object will always be the same and it doesn't matter if two (or more) threads do create a new instance, so I would say it is ok to do this without a lock.
One thing that might become a weird bug later would be if one would compare for equality of the instances, which would sometimes not be the same. But if you keep that in mind (or just don't do that) I see no other problems.
回答6:
Unfortunately, you need a lock. There are a lot of quite subtle bugs when you do not lock properly. For a daunting example look at this answer.
回答7:
One may safely use lazy initialization without a lock if the field will only be written if it is either blank or already holds either the value to be written or, in some cases, an equivalent. Note that no two mutable objects are equivalent; a field which holds a reference to a mutable object may only be written with a reference to the same object (meaning the write would have no effect).
There are three general patterns one may use for lazy initialization, depending upon circumstances:
- Use a lock if computing the value to write would be expensive, and one wishes to avoid expending such effort unnecessarily. The double-checked locking pattern is good on systems whose memory model supports it.
- If one is storing an immutable value, compute it if it seems to be necessary, and just store it. Other threads that don't see the store may perform a redundant computation, but they'll simply try to write the field with the value that's already there.
- If one is storing a reference to a cheap-to-produce mutable class object, create a new object if it seems to be necessary, and then use `Interlocked.CompareExchange` to store it if the field is still blank.
Note that if one can avoid locking on any access other than the first one within a thread, making the lazy reader thread-safe should not impose any significant performance cost. While it's common for mutable classes not to be thread-safe, all classes that claim to be immutable should be 100% thread-safe for any combination of reader actions. Any class which cannot meet such a thread-safety requirement should not claim to be immutable.
回答8:
This is definitely a problem.
Consider this scenario: Thread "A" accesses the property, and the collection is initialized. Before it assigns the local instance to the field "m_PropName", Thread "B" accesses the property, except it gets to complete. Thread "B" now has a reference to that instance, which is currently stored in "m_PropName"... until Thread "A" continues, at which point "m_PropName" is overwritten by the local instance in that thread.
There are now a couple of problems. First, Thread "B" doesn't have the correct instance anymore, since the owning object thinks that "m_PropName" is the only instance, yet it leaked out an initialized instance when Thread "B" completed before Thread "A". Another is if the collection changed between when Thread "A" and Thread "B" got their instances. Then you have incorrect data. It could even be worse if you were observing or modifying the read-only collection internally (which, of course, you can't with ReadOnlyCollection, but could if you replaced it with some other implementation which you could observe via events or modify internally but not externally).