When different variables are inside the same cache line, you can experience False Sharing, which means that even if two different threads (running on different cores) are accessing two different variables, if those two variables reside in the same cache line, you will have performance hit, as each time cache coherence will be triggered.
Now say those variables are atomic variables (By atomic I mean variables which introduce a memory fence, such as the atomic<t>
of C++), will false sharing matter there, or it does not matter if atomic variables are in the same cache line or not, as supposedly they will introduce cache coherence anyway. In other words, will putting atomic variables in the same cache line make application slower than not putting them in the same cache line?
False sharing of "atomic" variables could lead to performance problems (whether or not it will lead to such problems depends on a lot of things).
Let's say you have two cores,
A
andB
, and each operates on its own variable. Let's call these variablesa
andb
respectively.A
hasa
in its cache, andB
hasb
in its cache.Consider what happens when
A
incrementsa
.a
andb
share a cache line,B
's copy ofb
will get invalidated, and its next access tob
will incur a cache miss.a
andb
don't share a cache line, there's no impact onB
as far as its cached copy ofb
is concerned.This happens regardless of whether
a
andb
are "atomic".A clarification: for negative consequences at least some accesses to "falsely shared" variables should be writes. If writes are rare, performance impact of false sharing is rather negligible; the more writes (and so cache line invalidate messages) the worse performance.
Even with atomics, cache line sharing (either false or true) still matters. Look for some evidence here: http://www.1024cores.net/home/lock-free-algorithms/first-things-first. Thus, the answer is - yes, placing atomic variables used by different threads to the same cache line may make application slower compared to placing them to two different lines. However, I think it will be mostly unnoticed, unless the app spends a significant portion of its time updating these atomic variables.
If you use atomic variables with the strongest consistency requirements, a full memory barrier, the effect of false sharing will probably not be noticeable. For such an access the performance of an atomic operation is basically limited by the memory access latency. So things are slow anyhow, I don't think they would get much slower in the presence of false sharing.
If you have other less intrusive memory orderings the performance hit by the atomics itself may be less, and so the impact of false sharing might be significant.
In total, I would first look at the performance of the atomic operation itself before worrying about false sharing for such operations.