Generating a race condition with MRI

2019-02-17 05:05发布

问题:

I was wondering whether it's easy to make a race condition using MRI ruby(2.0.0) and some global variables, but as it turns out it's not that easy. It looks like it should fail at some point, but it doesn't and I've been running it for 10 minutes. This is the code I've been trying to achieve it:

def inc(*)
  a  = $x
  a +=  1
  a *= 3000
  a /= 3000
  $x =  a
end

THREADS = 10
COUNT   = 5000

loop do
  $x = 1
  THREADS.times.map do Thread.new { COUNT.times(&method(:inc)) } end.each(&:join)

  break puts "woo hoo!" if $x != THREADS * COUNT + 1
end

puts $x

Why am I not able to generate (or detect) the expected race condition, and get the output woo hoo! in Ruby MRI 2.0.0?

回答1:

Your example does (almost instantly) work in 1.8.7.

The following variation does the trick for 1.9.3+:

def inc
  a  = $x + 1
  # Just one microsecond
  sleep 0.000001
  $x =  a
end

THREADS = 10
COUNT   = 50

loop do
  $x = 1
  THREADS.times.map { Thread.new { COUNT.times { inc } } }.each(&:join)
  break puts "woo hoo!" if $x != THREADS * COUNT + 1
  puts "No problem this time."
end

puts $x

The sleep command is a strong hint to the interpreter that it can schedule another thread, so this is not a huge surprise.

Note if you replace the sleep with something that takes just as long or longer, e.g. b = a; 500.times { b *= 100 }, then there is no race condition detected in the above code. But take it further with b = a; 2500.times { b *= 100 }, or increase COUNT from 50 to 500, and the race condition is more reliably triggered.

The thread scheduling in Ruby 1.9.3 onwards (of course including 2.0.0) appears to be assigning CPU time in larger chunks than in 1.8.7. Opportunities to switch threads can be low in simple code, unless some kind of I/O waiting is involved.

It is even possible that the threads in the OP, each of which is performing just a few thousand calculations, are in essence occurring in series - although increasing the COUNT global to avoid this still does not trigger additional race conditions.

Generally MRI Ruby does not switch context between threads during atomic processes (e.g. during a Fixnum multiply or division) that occur within its C implementation. This means that the only opportunities for a thread context switch where all methods are calls to Ruby internals without I/O waiting, are "in-between" each line of code. In the original example, there are only 4 such fleeting opportunities, and it seems that in the scheme of things that this is not very much at all for MRI 1.9.3+ (in fact, see update below, these opportunities probably have been removed by Ruby)

When I/O waits or sleep are involved, it actually gets more complex, as Ruby MRI (1.9+) will allow a little bit of true parallel processing on multi-core CPUs. Although this is not the direct cause of race conditions with threads, it is more likely to result in them, as Ruby will usually make a thread context switch at the same time to take advantage of the parallelism.

Whilst I was researching this rough answer, I found an interesting link: Nobody understands the GIL (part 2 linked, as more relevant to this question)


Update: I suspect that the interpretter is optimising away some potential thread-switching points in the Ruby source. Starting with my sleep version of the code, and setting:

COUNT   = 500000

the following variation of inc does not seem to have a race condition affecting $x:

def inc
  a  = $x + 1
  b = 0
  b += 1
  $x =  a
end

However, these minor changes both trigger a race condition:

def inc
  a  = $x + 1
  b = 0
  b = b.send( :+, 1 )
  $x =  a
end

def inc
  a  = $x + 1
  b = 0
  b += '1'.to_i
  $x =  a
end

My interpretation is that the Ruby parser has optimised b += 1 to remove some of the overhead of method despatch. One of the optimised-away steps is likely to include the check for a possible switch to a waiting thread.

If that is the case, then the code in the question may never have the opportunity to switch threads within the inc method, because all the operations inside it can be optimised in the same way.