C optimization: conditional store to avoid dirtyin

2020-06-03 05:09发布

In the libuv source, I found this code:

  /* The if statement lets the compiler compile it to a conditional store.
   * Avoids dirtying a cache line.
   */
  if (loop->stop_flag != 0)
    loop->stop_flag = 0;

Can someone explain this a bit?

What exactly is a cache line?

Also, I guess a conditional store is some Assembler instruction which checks something and if succeeded, writes some value. Right?

When does such construct makes sense? I guess not always, because otherwise the compiler would just always use the conditional store, right?

标签: c libuv
2条回答
走好不送
2楼-- · 2020-06-03 05:42

A cache is organized in blocks of fast memory that, for historical reasons, are called lines. When you write to a cache line, it is marked as "dirty," meaning a bit is set in the cache controller hardware signifying that the line needs to be copied to other levels of cache and/or main memory before some other part of the system can access it.

In general, each level of the memory hierarchy: registers, L1, L2, L3 ... cache, main memory, and swap space has different copies of the same information. Making sure that different parts of the system (processors, DMA, video subsystem, etc.) see the same value even though one or more copies may have changed is called the coherency problem.

The general solution is pausing to copy an updated value to different levels of the hierarchy. This is called a flush.

A flush can cost 10's to - in the worst case when it causes a page fault - perhaps millions of processor cycles.

Due to this high cost, hardware designers go to great lengths to minimize the need for flushes. Here the programmer has taken up the cause, too.

The comment is saying "if the cache already contains a zero in the flag, let's not write a zero over the zero because this will mark the cache line dirty, which may cause an unnecessary flush."

"Conditional store" is a slightly obscure term. It just refers to a jump on zero around the normal store, which is the code a compiler will produce from the if statement. In X86 it will look something like:

    ;; assume edi holds value of pointer 'loop'
    ;; and flag is a constant offset for the 'stop_flag' field.
    cmp dword ptr [edi, flag], 0
    jz no_store
    mov [edi, flag], 0
no_store:
   ... code continues

With the if statement missing, you'd have only the last mov instruction.

NB A commenter pointed out that there do exist single "conditional move/store" instructions on important processor architectures. I have not seen gcc produce one.

Whether this is a worthwhile optimization is very debatable. Conditionals have their own risks of flushing the instruction pipeline (a different kind of flush). Never sacrifice clarity for speed without clear evidence that it's needed.

查看更多
三岁会撩人
3楼-- · 2020-06-03 06:02

"to cache" means to hide something. The function of a Cache in computing is to hide the distance to main memory, by preempting main memory accesses as much as possible.

That only works if you used the data before, you have not pushed it out of cache yet, and nobody else took it away before you. Any other actor (other CPU, IO-Bus, ...) must be able to get the current value and change it, even if you have it cached. This task is accomplished using cache coherency protocols. Higher coherency means higher cost.

What your code tries to do, is make the compiler emit a conditional move, so the CPU checks for 0, and only writes if not 0. There's a whole family of conditional move instructions in the Intel/AMD IS and many others.

So now, step for step:

  • Test for 0: If your CPU does not have a copy of the tested data, it must ask for one. That's far worse than before. Let's hope you don't hit main memory.
  • Prepare to write a value:
    • You own the data: Wonderful, you are already done.
    • You don't own the data: The Cache calls its brethren and higher layers to inform them that it now owns this piece. Nobody else can retain a copy.
  • Write a value: The cache saves the change and marks the cacheline (Lowest granularity of cache) dirty, need to write back.

So, is it worth it? It depends.

Aside: Why provide conditional store instruction, if you can synthesise them using a conditional jump and a store? The advantages are using less instructions and no risk of flushing the instruction pipeline (partly executed following instructions). UPDATE: Looks like they cannot move from register/immediate to memory on x86/x86_64.

查看更多
登录 后发表回答