gcc atomic read and writes

2020-07-18 03:42发布

问题:

I have a multithreaded application where I one producer thread(main) and multiple consumers.

Now from main I want to have some sort of percentage of how far into the work the consumers are. Implementing a counter is easy as the work that is done a loop. However since this loop repeats a couple of thousands of times, maybe even more than a million times. I don`t want to mutex this part. So I went looking into some atomic options of writing to an int.

As far as I understand I can use the builtin atomic functions from gcc: https://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html

however, it doesn`t have a function for just reading the variable I want to work on.

So basically my question is.

  1. can I read from the variable safely from my producer, as long as I use the atomic builtins for writing to that same variable in the consumer

or

  1. do I need some sort of different function to read from the variable. and what function is that

回答1:

Define "safely".

If you just use a regular read, on x86, for naturally aligned 32-bit or smaller data, the read is atomic, so you will always read a valid value rather than one containing some bytes written by one thread and some by another. If any of those things are not true (not x86, not naturally aligned, larger than 32 bits...) all bets are off.

That said, you have no guarantee whatsoever that the value read will be particularly fresh, or that the sequence of values seen over multiple reads will be in any particular order. I have seen naive code using volatile to defeat the compiler optimising away the read entirely but no other synchronisation mechanism, literally never see an updated value due to CPU caching.

If any of these things matter to you, and they really should, you should explicitly make the read atomic and use the appropriate memory barriers. The intrinsics you refer to take care of both of these things for you: you could call one of the atomic intrinsics in such a way that there is no side effect other than returning the value:

__sync_val_compare_and_swap(ptr, 0, 0)

or

__sync_add_and_fetch(ptr, 0)

or

__sync_sub_and_fetch(ptr, 0)

or whatever



回答2:

If your compiler supports it, you can use C11 atomic types. They are introduced in the section 7.17 of the standard, but they are unfortunately optional, so you will have to check whether __STDC_NO_ATOMICS__ is defined to at least throw a meaningful error if it's not supported.

With gcc, you apparently need at least version 4.9, because otherwise the header is missing (here is a SO question about this, but I can't verify because I don't have GCC-4.9).



回答3:

I'll answer your question, but you should know upfront that atomics aren't cheap. The CPU has to synchronize between cores every time you use atomics, and you won't like the performance results if you use atomics in a tight loop.

The page you linked to lists atomic operations for the writer, but says nothing about how such variables should be read. The answer is that your other CPU cores will "see" the updated values, but your compiler may "cache" the old value in a register or on the stack. To prevent this behavior, I suggest you declare the variable volatile to force your compiler not to cache the old value.

The only safety issue you will encounter is stale data, as described above.

If you try to do anything more complex with atomics, you may run into subtle and random issues with the order atomics are written to by one thread versus the order you see those changes in another thread. Unfortunately you're not using a built-in language feature, and the compiler builtins aren't designed perfectly. If you choose to use these builtins, I suggest you keep your logic very simple.



回答4:

If I understood the problem, I would not use any atomic variable for the counters. Each worker thread can have a separate counter that it updates locally, the master thread can read the whole array of counters for an approximate snapshot value, so this becomes a 1 consumer 1 producer problem. The memory can be made visible to the master thread, for example, every 5 seconds, by using __sync_synchronize() or similar.