Why is the << operation on an array in Ruby

2019-02-16 12:22发布

问题:

In Ruby, this code is not threadsafe if array is modified by many threads:

array = []
array << :foo # many threads can run this code

Why is the << operation not thread safe?

回答1:

array is your program variable when you apply an operation like << to it. It happens in three-steps:

  • The variable is first copied into a CPU register.
  • The CPU performs computations.
  • The CPU writes back the result to variable memory.

So this high-level single-operation is performed in three steps. In between these steps, due to thread-context switching, other thread may read the same (old) value of the variable. That's why it's not an atomic operation.



回答2:

If you have multiple threads accessing the same array, use Ruby's built-in Queue class. It nicely handles producers and consumers.

This is the example from the documentation:

require 'thread'

queue = Queue.new

producer = Thread.new do
  5.times do |i|
    sleep rand(i) # simulate expense
    queue << i
    puts "#{i} produced"
  end
end

consumer = Thread.new do
  5.times do |i|
    value = queue.pop
    sleep rand(i/2) # simulate expense
    puts "consumed #{value}"
  end
end

consumer.join


回答3:

Actually using MRI (Matz's Ruby implementation) the GIL (Global Interpreter Lock) makes any pure C-function atomic.

Since Array#<< is implemented as pure C-code in MRI, this operation will be atomic. But note this only applies to MRI. On JRuby this is not the case.

To completely understand what is going on I suggest you read these two articles, which explains everything very well:

Nobody Understands the GIL
Nobody Understands the GIL - part 2



回答4:

This link might be helpful for you:

http://www.jstorimer.com/pages/ruby-core-classes-arent-thread-safe

Also you might be interested in this gem:

https://rubygems.org/gems/thread_safe



回答5:

Because Ruby is a very high level language, nothing is really atomic at the OS level. Only very simple assembly operations are atomic at the OS level (OS dependant), and every Ruby operation, even a simple 1 + 1 corresponds to hundreds or thousands of assembly instructions executed, such as method lookups, garbage collection, object initialization, scope calculations, etc.

If you need to make operations atomic, use Mutexes.



回答6:

Just riffing off of @Linuxios and @TheTinMan: high-level language (HLL) operations in general are not atomic. Atomicity is (generally) not an issue in single-threaded programs. In multi-threaded programs, you (the programmer) have to reason about it at a much higher granularity than a single HLL operation, so having individual HLL operations that are atomic doesn't actually help you that much. On the flip side, although making an HLL operation atomic takes only a few machine instructions before and after—at least on modern hardware—the static (binary size) and dynamic (execution time) overheads add up. Even worse, explicit atomicity pretty much disables all optimization because compilers cannot move instructions across atomic operations. No real benefit + significant cost = non-starter.