Why is Symbol#to_proc slower in Ruby 1.8.7?

2019-02-18 09:25发布

Relative Performance of Symbol#to_proc in Popular Ruby Implementations states that in MRI Ruby 1.8.7, Symbol#to_proc is slower than the alternative in their benchmark by 30% to 130%, but that this isn't the case in YARV Ruby 1.9.2.

Why is this the case? The creators of 1.8.7 didn't write Symbol#to_proc in pure Ruby.

Also, are there any gems that provide faster Symbol#to_proc performance for 1.8?

(Symbol#to_proc is starting to appear when I use ruby-prof, so I don't think I'm guilty of premature optimization)

3条回答
姐就是有狂的资本
2楼-- · 2019-02-18 10:18

The following ordinary Ruby code:

if defined?(RUBY_ENGINE).nil? # No RUBY_ENGINE means it's MRI 1.8.7
  class Symbol
    alias_method :old_to_proc, :to_proc

    # Class variables are considered harmful, but I don't think
    # anyone will subclass Symbol
    @@proc_cache = {}
    def to_proc
      @@proc_cache[self] ||= old_to_proc
    end
  end
end

Will make Ruby MRI 1.8.7 Symbol#to_proc slightly less slow than before, but not as fast as an ordinary block or a pre-existing proc.

However, it'll make YARV, Rubinius and JRuby slower, hence the if around the monkeypatch.

The slowness of using Symbol#to_proc isn't solely due to MRI 1.8.7 creating a proc each time - even if you re-use an existing one, it's still slower than using a block.

Using Ruby 1.8 head

Size    Block   Pre-existing proc   New Symbol#to_proc  Old Symbol#to_proc
0       0.36    0.39                0.62                1.49
1       0.50    0.60                0.87                1.73
10      1.65    2.47                2.76                3.52
100     13.28   21.12               21.53               22.29

For the full benchmark and code, see https://gist.github.com/1053502

查看更多
我想做一个坏孩纸
3楼-- · 2019-02-18 10:21

The to_proc implementation in 1.8.7 looks like this (see object.c):

static VALUE
sym_to_proc(VALUE sym)
{
    return rb_proc_new(sym_call, (VALUE)SYM2ID(sym));
}

Whereas the 1.9.2 implementation (see string.c) looks like this:

static VALUE
sym_to_proc(VALUE sym)
{
    static VALUE sym_proc_cache = Qfalse;
    enum {SYM_PROC_CACHE_SIZE = 67};
    VALUE proc;
    long id, index;
    VALUE *aryp;

    if (!sym_proc_cache) {
        sym_proc_cache = rb_ary_tmp_new(SYM_PROC_CACHE_SIZE * 2);
        rb_gc_register_mark_object(sym_proc_cache);
        rb_ary_store(sym_proc_cache, SYM_PROC_CACHE_SIZE*2 - 1, Qnil);
    }

    id = SYM2ID(sym);
    index = (id % SYM_PROC_CACHE_SIZE) << 1;

    aryp = RARRAY_PTR(sym_proc_cache);
    if (aryp[index] == sym) {
        return aryp[index + 1];
    }
    else {
        proc = rb_proc_new(sym_call, (VALUE)id);
        aryp[index] = sym;
        aryp[index + 1] = proc;
        return proc;
    }
}

If you strip away all the busy work of initializing sym_proc_cache, then you're left with (more or less) this:

aryp = RARRAY_PTR(sym_proc_cache);
if (aryp[index] == sym) {
    return aryp[index + 1];
}
else {
    proc = rb_proc_new(sym_call, (VALUE)id);
    aryp[index] = sym;
    aryp[index + 1] = proc;
    return proc;
}

So the real difference is the 1.9.2's to_proc caches the generated Procs while 1.8.7 generates a brand new one every single time you call to_proc. The performance difference between these two will be magnified by any benchmarking you do unless each iteration is done in a separate process; however, one iteration per-process would mask what you're trying to benchmark with the start-up cost.

The guts of rb_proc_new look pretty much the same (see eval.c for 1.8.7 or proc.c for 1.9.2) but 1.9.2 might benefit slightly from any performance improvements in rb_iterate. The caching is probably the big performance difference.

It is worth noting that the symbol-to-hash cache is a fixed size (67 entries but I'm not sure where 67 comes from, probably related to the number of operators and such that are commonly used for symbol-to-proc conversions):

id = SYM2ID(sym);
index = (id % SYM_PROC_CACHE_SIZE) << 1;
/* ... */
if (aryp[index] == sym) {

If you use more than 67 symbols as procs or if your symbol IDs overlap (mod 67) then you won't get the full benefit of the caching.

The Rails and 1.9 programming style involves a lot of shorthands like:

    id = SYM2ID(sym);
    index = (id % SYM_PROC_CACHE_SIZE) << 1;

rather than the longer explicit block forms:

ints = strings.collect { |s| s.to_i }
sum  = ints.inject(0) { |s,i| s += i }

Given that (popular) programming style, it makes sense to trade memory for speed by caching the lookup.

You're not likely to get a faster implementation from a gem as the gem would have to replace a chunk of the core Ruby functionality. You could patch the 1.9.2 caching into your 1.8.7 source though.

查看更多
我想做一个坏孩纸
4楼-- · 2019-02-18 10:29

In addition to not caching procs, 1.8.7 also creates (approximately) one array each time a proc is called. I suspect it's because the generated proc creates an array to accept the arguments - this happens even with an empty proc that takes no arguments.

Here's a script to demonstrate the 1.8.7 behavior. Only the :diff value is significant here, which shows the increase in array count.

# this should really be called count_arrays
def count_objects(&block)
  GC.disable
  ct1 = ct2 = 0
  ObjectSpace.each_object(Array) { ct1 += 1 }
  yield
  ObjectSpace.each_object(Array) { ct2 += 1 }
  {:count1 => ct1, :count2 => ct2, :diff => ct2-ct1}
ensure
  GC.enable
end

to_i = :to_i.to_proc
range = 1..1000

puts "map(&to_i)"
p count_objects {
  range.map(&to_i)
}
puts "map {|e| to_i[e] }"
p count_objects {
  range.map {|e| to_i[e] }
}
puts "map {|e| e.to_i }"
p count_objects {
  range.map {|e| e.to_i }
}

Sample output:

map(&to_i)
{:count1=>6, :count2=>1007, :diff=>1001}
map {|e| to_i[e] }
{:count1=>1008, :count2=>2009, :diff=>1001}
map {|e| e.to_i }
{:count1=>2009, :count2=>2010, :diff=>1}

It seems that merely calling a proc will create the array for every iteration, but a literal block only seems to create an array once.

But multi-arg blocks may still suffer from the problem:

plus = :+.to_proc
puts "inject(&plus)"
p count_objects {
  range.inject(&plus)
}
puts "inject{|sum, e| plus.call(sum, e) }"
p count_objects {
  range.inject{|sum, e| plus.call(sum, e) }
}
puts "inject{|sum, e| sum + e }"
p count_objects {
  range.inject{|sum, e| sum + e }
}

Sample output. Note how we incur a double penalty in case #2, because we use a multi-arg block, and also call the proc.

inject(&plus)
{:count1=>2010, :count2=>3009, :diff=>999}
inject{|sum, e| plus.call(sum, e) }
{:count1=>3009, :count2=>5007, :diff=>1998}
inject{|sum, e| sum + e }
{:count1=>5007, :count2=>6006, :diff=>999}
查看更多
登录 后发表回答