可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

The context is Inter-Process-Communication where one process("Server") has to send fixed-size structs to many listening processes("Clients") running on the same machine.

I am very comfortable doing this in Socket Programming. To make the communication between the Server and the Clients faster and to reduce the number of copies, I want to try out using Shared Memory(shm) or mmaps.

The OS is RHEL 64bit.

Since I am a newbie, please suggest which should I use. I'd appreciate it if someone could point me to a book or online resource to learn the same.

Thanks for the answers. I wanted to add that the Server ( Market Data Server ) will typically be receiving multicast data, which will cause it to be "sending" about 200,000 structs per second to the "Clients", where each struct is roughly 100 Bytes. Does shm_open/mmap implementation outperform sockets only for large blocks of data or a large volume of small structs as well ?

回答1:

I'd use mmap together with shm_open to map shared memory into the virtual address space of the processes. This is relatively direct and clean:

you identify your shared memory segment with some kind of symbolic name, something like "/myRegion"
with shm_open you open a file descriptor on that region
with ftruncate you enlarge the segment to the size you need
with mmap you map it into your address space

The shmat and Co interfaces have (at least historically) the disadvantage that they may have a restriction in the maximal amount of memory that you can map.

Then, all the POSIX thread synchronization tools (pthread_mutex_t, pthread_cond_t, sem_t, pthread_rwlock_t, ...) have initialization interfaces that allow you to use them in a process shared context, too. All modern Linux distributions support this.

Whether or not this is preferable over sockets? Performance wise it could make a bit of a difference, since you don't have to copy things around. But the main point I guess would be that, once you have initialized your segment, this is conceptually a bit simpler. To access an item you'd just have to take a lock on a shared lock, read the data and then unlock the lock again.

As @R suggests, if you have multiple readers pthread_rwlock_t would probably the best lock structure to use.

回答2:

I once implemented an IPC library using shared memory segments; this allowed me to avoid a copy (instead of copying data from sender memory, to kernel space, and then from kernel space to receiver memory, I could directly copying from sender to receiver memory).

Anyway results weren't as good as I was expecting: actually sharing a memory segment was a really expensive process, since remapping TLB entries and all the rest is quite expensive. See this mail for more details (I'm no one of those guys, but got into such mail while developing my library).

Results were good only for really big messages (say more than a few megabytes), if you're working with little buffers, unix sockets are the most optimized thing you can find unless you are willing to write a kernel module.

回答3:

Apart from what's been suggested already, I'd like to offer another method: IPv6 Node/Interface Local Multicast, i.e. a multicast constrained to the loopback interface. http://www.iana.org/assignments/ipv6-multicast-addresses/ipv6-multicast-addresses.xml#ipv6-multicast-addresses-1

At first this might seem quite heavyweight, but most OS implement loopback sockets in a zero-copy architecture. The page(s) mapped to the buf parameter passed to send will be assigned an additional mapping and marked as copy on write so that if the sending program overwrites the data therein, or deallocates the contents will be preserved.

Instead of passing raw structs you should use a robust data structure. Netstrings http://cr.yp.to/proto/netstrings.txt and BSON http://bsonspec.org/ come to mind.

回答4:

Choosing between the POSIX shm_open/mmap interface and the older System V shmop one won't make a big difference, because after the initialization system calls, you end up with the same situation: a memory area that is shared between various processes. If your system supports it, I'd recommend to go with shm_open/mmap, because this is a better designed interface.

You then use the shared memory area as a common blackboard where all processes can scribble their data. The difficult part is to synchronize the processes accessing this area. Here I recommend to avoid concocting your own synchronization scheme, which can be fiendishly difficult and error-prone. Instead, use the existing working socket-based implementation for synchronizing access between processes, and use the shared memory only for transferring large amounts of data between processes. Even with this scheme you'll need a central process to coordinate the allocation of buffers, so this scheme is worth it only if you have very large volumes of data to transfer. Alternatively, use a synchronization library, like Boost.Interprocess.