Memory barriers and the TLB

Memory barriers guarantee that the data cache will be consistent. However, does it guarantee that the TLB will be consistent?

I am seeing a problem where the JVM (java 7 update 1) sometimes crashes with memory errors (SIGBUS, SIGSEG) when passing a MappedByteBuffer between threads.

e.g.

final AtomicReference<MappedByteBuffer> mbbQueue = new AtomicReference<>();

// in a background thread.
MappedByteBuffer map = raf.map(MapMode.READ_WRITE, offset, allocationSize);
Thread.yield();
while (!inQueue.compareAndSet(null, map));


// the main thread. (more than 10x faster than using map() in the same thread)
MappedByteBuffer mbb = inQueue.getAndSet(null);

Without the Thread.yield() I occasionally get crashes in force(), put(), and C's memcpy() all indicating I am trying to access memory illegally. With the Thread.yield() I haven't had a problem, but that doesn't sound like a reliable solution.

Has anyone come across this problem? Are there any guarantees about TLB and memory barriers?

EDIT: The OS is Centos 5.7, I have seen the behaviour on i7 and a Dual Xeon machines.

Why do I do this? Because the average time to write a message is 35-100 ns depending on length and using a plain write() isn't as fast. If I memory map and clean up in the current thread this takes 50-130 microseconds, using a background thread to do it takes about 3-5 microseconds for the main thread to swap buffers. Why do I need to be swapping buffers at all? Because I am writing many GB of data and ByteBuffer cannot be 2+ GB in size.

The mapping is done via mmap64 (FileChannel.map). When the address is accessed there will be a page fault and the kernel shall read/write there for you. TLB doesn't need to be updated during mmap.

TLB (of all cpus) is unvalidated during munmap which is handled by the finalization of the MappedByteBuffer, hence munmap is costly.

Mapping involves a lot synchronization so the address value shall not be corrupted.

Any chance you try fancy stuff via Unsafe?