I'm having what appears to be a caching problem when using /dev/mem with mmap on a dual ARM processor system (Xilinx Zynq, to be exact). My configuration is asymmettric, with one processor running Linux and the other processor running a bare metal application. They communicate through a block of RAM that isn't in the Linux virtual memory space (it was excluded by the devicetree file). When my userspace Linux application writes to memory using the pointer returned from mmap(), it can take anywhere from 100 ms to well over a second for the second processor to detect the changed memory content.
On the open() call to /dev/mem, I tried to specify O_RDRW, O_SYNC, and O_DIRECT, but the O_DIRECT caused the open to fail, so I removed O_DIRECT. I thought O_SYNC should have guaranteed that data was written to memory before the write() call returned, but I'm using a memory pointer instead of writing through write(). I don't see any parameters on the mmap() call that would seem to address caching issues.
I've tried calling fsync(fd) and fdatasync() after writing to memory, but that didn't change the behavior.
What DID seem to work was spawning this command immediately after the memory write: sync; echo 3 /proc/sys/vm/drop_caches
What is the simplest way to get writes via a mapped memory pointer to flush immediately?
fsync, etc. all synchronize the memory mapped region to the backing block device (e.g., file).
They do not affect the CPU data cache. You will either need to use explicit cache clean calls to flush the CPU cache to DRAM or you will have to use the ACP port.
The ACP port is supposed to be cache coherent, but I've never gotten it to work.
Here's an answer for how to flush the cache. I believe that code needs to go in your device driver. We have that code packaged in a generic "portalmem" driver. It enables your application to allocate memory that you can share with your hardware, and it provides an ioctl for flushing the cache after your application writes to it.