可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
Every Modern OS provides today some atomic operations:
- Windows has
Interlocked*
API
- FreeBSD has
<machine/atomic.h>
- Solaris has
<atomic.h>
- Mac OS X has
<libkern/OSAtomic.h>
Anything like that for Linux?
- I need it to work on most Linux supported platforms including: x86, x86_64 and arm.
- I need it to work on at least GCC and Intel Compiler.
- I need not to use 3rd par library like glib or qt.
- I need it to work in C++ (C not required)
Issues:
- GCC atomic builtins
__sync_*
are not supported on all platforms (ARM) and are not supported by the Intel compiler.
- AFAIK
<asm/atomic.h>
should not be used in user space and I haven't successfully used it at all. Also, I'm not sure if it would work with Intel compiler.
Any suggestions?
I know that there are many related questions but some of them point to __sync*
which is not feasible for me (ARM) and some point to asm/atomic.h
.
Maybe there is an inline assembly library that does this for GCC (ICC supports gcc assembly)?
Edit:
There is a very partial solution for add operations only (allows implementing atomic counter but not lock free-structures that require CAS):
If you use libstc++
(Intel Compiler uses libstdc++
) then you can use __gnu_cxx::__exchange_and_add
that defined in <ext/atomicity.h>
or <bits/atomicity.h>
. Depends on compiler version.
However I'd still like to see something that supports CAS.
回答1:
Projects are using this:
http://packages.debian.org/source/sid/libatomic-ops
If you want simple operations such as CAS, can't you just just use the arch-specific implementations out of the kernel, and do arch checks in user-space with autotools/cmake? As far as licensing goes, although the kernel is GPL, I think it's arguable that the inline assembly for these operations is provided by Intel/AMD, not that the kernel has a license on them. They just happen to be in an easily accessible form in the kernel source.
回答2:
Recent standards (from 2011) of C & C++ now specify atomic operations:
- C11:
stdatomic.h
- C++11:
std::atomic
Regardless, your platform or compiler may not support these newer headers & features.
回答3:
Darn. I was going to suggest the GCC primitives, then you said they were off limits. :-)
In that case, I would do an #ifdef
for each architecture/compiler combination you care about and code up the inline asm. And maybe check for __GNUC__
or some similar macro and use the GCC primitives if they are available, because it feels so much more right to use those. :-)
You are going to have a lot of duplication and it might be difficult to verify correctness, but this seems to be the way a lot of projects do this, and I've had good results with it.
Some gotchas that have bit me in the past: when using GCC, don't forget "asm volatile
" and clobbers for "memory"
and "cc"
, etc.
回答4:
Boost, which has a non intrusive license, and other frameworks already offer portable atomic counters -- as long as they are supported on the target platform.
Third party libraries are good for us. And if for strange reasons your company forbid you from using them, you can still have a look at how they proceed (as long as the licence permit it for your use) to implement what your are looking for.
回答5:
I recently did an implementation of such a thing and I was confronted to the same difficulties as you are. My solution was basically the following:
- try to detect the gcc builtins with
the feature macro
- if not available just implement
something like
cmpxch
with __asm__
for the other architectures (ARM is a bit more complicated than that). Just do that for one possible size, e.g sizeof(int)
.
- implement all other functionality on
top of that one or two primitives
with
inline
functions
回答6:
There is a patch for GCC here to support ARM atomic operations. WIll not help you on Intel, but you could examine the code - there is recent kernel support for older ARM architectures, and newer ones have the instructions built in, so you should be able to build something that works.
http://gcc.gnu.org/ml/gcc-patches/2011-07/msg00050.html
回答7:
__sync*
certainly is (and has been) supported by the Intel compiler, because GCC adopted these build-ins from there. Read the first paragraph on this page. Also see "Intel® C++ Compiler for Linux* Intrinsics Reference", page 198. It's from 2006 and describes exactly those built-ins.
Regarding ARM support, for older ARM CPUs: it cannot be done entirely in userspace, but it can be done in kernelspace (by disabling interrupts during the operation), and I think I read somewhere that it is supported for quite a while now.
According to this PHP bug, dated 2011-10-08, __sync_*
will only fail on
- PA-RISC with anything other than Linux
- SPARCv7 and lower
- ARM with GCC < 4.3
- ARMv5 and lower with anything other than Linux
- MIPS1
So with GCC > 4.3 (and 4.7 is the current one), you shouldn't have a problem with ARMv6 and newer. You shouldn't have no problem with ARMv5 either as long as compiling for Linux.
回答8:
On Debian/Ubuntu recommend...
sudo apt-get install libatomic-ops-dev
examples: http://www.hpl.hp.com/research/linux/atomic_ops/example.php4
GCC & ICC compatible.
compared to Intel Thread Building Blocks (TBB), using atomic< T >, libatomic-ops-dev is over twice as fast! (Intel compiler)
Testing on Ubuntu i7 producer-consumer threads piping 10 million ints down a ring buffer connection in 0.5secs as opposed to 1.2secs for TBB
And easy to use e.g.
volatile AO_t head;
AO_fetch_and_add1(&head);
回答9:
See: kernel_user_helpers.txt or entry-arm.c and look for __kuser_cmpxchg
. As seen in comments of other ARM Linux versions,
kuser_cmpxchg
Location: 0xffff0fc0
Reference prototype:
int __kuser_cmpxchg(int32_t oldval, int32_t newval, volatile int32_t *ptr);
Input:
r0 = oldval
r1 = newval
r2 = ptr
lr = return address
Output:
r0 = success code (zero or non-zero)
C flag = set if r0 == 0, clear if r0 != 0
Clobbered registers:
r3, ip, flags
Definition:
Atomically store newval in *ptr only if *ptr is equal to oldval.
Return zero if *ptr was changed or non-zero if no exchange happened.
The C flag is also set if *ptr was changed to allow for assembly
optimization in the calling code.
Usage example:
typedef int (__kuser_cmpxchg_t)(int oldval, int newval, volatile int *ptr);
#define __kuser_cmpxchg (*(__kuser_cmpxchg_t *)0xffff0fc0)
int atomic_add(volatile int *ptr, int val)
{
int old, new;
do {
old = *ptr;
new = old + val;
} while(__kuser_cmpxchg(old, new, ptr));
return new;
}
Notes:
- This routine already includes memory barriers as needed.
- Valid only if __kuser_helper_version >= 2 (from kernel version 2.6.12).
This is for use with Linux with ARMv3 using the swp
primitive. You must have a very ancient ARM not to support this. Only a data abort or interrupt can cause the spinning to fail, so the kernel monitors for this address ~0xffff0fc0 and performs a user space PC
fix-up when either a data abort or an interrupt occurs. All user-space libraries that support ARMv5 and lower will use this facility.
For instance, QtConcurrent uses this.