可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
Using the following benchmark:
def create_genome
"gattaca" * 100
end
def count_frequency_using_chars(sequence)
100000.times do
sequence.chars.group_by{|x| x}.map{|letter, array| [letter, array.count]}
end
end
def count_frequency_using_count(sequence)
100000.times do
["a", "c", "g", "t"].map{|letter| sequence.count(letter)}
end
end
sequence = create_genome
count_frequency_using_chars(sequence)
count_frequency_using_count(sequence)
I found that, in C-based Ruby for both 1.8 and 1.9.2, using String#count(letter)
is approximately 50 times faster than sorting and counting them using Enumerable#group_by
and Array#count
. I was slightly surprised at this, because the String#count
approach reads through the the string four times each iteration, whereas the latter only reads through it once.
I tried running the code under ruby-prof and perftools.rb, and both of them merely indicated that String#chars
took 90% of the time, with no break-down of where that 90% of time was spent.
If I had to guess why there's a difference, I'd say that creating 70 million single-character strings would be expensive, but how would I be able to know? (Update: String#chars
wasn't the culprit - see the benchmark for mainly_execute_a_trivial_block
)
Edit: Current benchmarks using 1.9.2 patchlevel 180:
require 'pp'
require 'benchmark'
def create_genome
"gattaca" * 100
end
ZILLION = 100000
def count_frequency_using_count(sequence)
ZILLION.times do
["a", "c", "g", "t"].map{|letter| sequence.count(letter)}
end
end
def count_frequency_using_chars(sequence)
ZILLION.times do
sequence.chars.group_by{|x| x}.map{|letter, array| [letter, array.count]}
end
end
def count_frequency_using_inject_hash(sequence)
ZILLION.times do
sequence.chars.inject(Hash.new(0)) { |h, e| h[e] += 1 ; h }
end
end
def count_frequency_using_each_with_object(sequence)
ZILLION.times do
sequence.chars.each_with_object(Hash.new(0)) { |char, hash| hash[char] += 1}
end
end
def just_group_by(sequence)
ZILLION.times do
sequence.chars.group_by{|x| x}
end
end
def just_chars_and_trivial_block(sequence)
ZILLION.times do
sequence.chars() {}
end
end
def mainly_execute_a_trivial_block(sequence)
ZILLION.times do
sequence.length.times() {}
end
end
def execute_an_empty_loop_instead(sequence)
ZILLION.times do
i = 0
max = sequence.length
until i == max
i += 1
end
end
end
sequence = create_genome
puts RUBY_VERSION
Benchmark.bm do |benchmark|
benchmark.report do
count_frequency_using_count(sequence)
end
benchmark.report do
count_frequency_using_chars(sequence)
end
benchmark.report do
count_frequency_using_inject_hash(sequence)
end
benchmark.report do
count_frequency_using_each_with_object(sequence)
end
benchmark.report do
just_group_by(sequence)
end
benchmark.report do
just_chars_and_trivial_block(sequence)
end
benchmark.report do
mainly_execute_a_trivial_block(sequence)
end
benchmark.report do
execute_an_empty_for_loop_instead(sequence)
end
end
Results:
user system total real
0.510000 0.000000 0.510000 ( 0.508499) # count_frequency_using_count
23.470000 0.030000 23.500000 ( 23.575487) # count_frequency_using_chars
32.770000 0.050000 32.820000 ( 32.874634) # count_frequency_using_inject_hash
31.880000 0.040000 31.920000 ( 31.942437) # count_frequency_using_each_with_object
22.950000 0.030000 22.980000 ( 22.970500) # just_group_by
13.300000 0.020000 13.320000 ( 13.314466) # just_chars_and_trivial_block
5.660000 0.000000 5.660000 ( 5.661516) # mainly_execute_a_trivial_block
1.930000 0.010000 1.940000 ( 1.934861) # execute_an_empty_loop_instead
回答1:
Its nothing to do with the ruby internals. You are comparing apples with oranges.
in your first example, you are grouping 700 char string 100000 times and finding the count. So its a problem in your logic. not in counting.In the second approach you are just counting,
And in both the approaches you are just using count only
just change the first example like this
def count_frequency_using_chars(sequence)
grouped = sequence.chars.group_by{|x| x}
100000.times do
grouped.map{|letter, array| [letter, array.count]}
end
end
And its as fast as your second
Edit
This approach is 3x faster than the count_frequency_using_count
, check the benchmarks
def count_frequency_using_chars_with_single_group(sequence)
grouped = sequence.chars.group_by{|x| x}
100000.times do
grouped.map{|letter, array| [letter, array.count]}
end
end
def count_frequency_using_count(sequence)
100000.times do
["a", "c", "g", "t"].map{|letter| sequence.count(letter)}
end
end
Benchmark.bm do |benchmark|
benchmark.report do
pp count_frequency_using_chars_with_single_group(sequence)
end
benchmark.report do
pp count_frequency_using_count(sequence)
end
end
user system total real
0.410000 0.000000 0.410000 ( 0.419100)
1.330000 0.000000 1.330000 ( 1.324431)
Andrew to your comments,
measuring the character composition of 100000 sequences once each, not the character composition of one sequence 100000 times
, still your count approach is too slower than the group_by approach. I just benchmarked the large strings as you said
seq = "gattaca" * 10000
#seq length is 70000
arr_seq = (1..10).map {|x| seq}
#10 seq items
and changed the methods to handle the multiple sequences
def count_frequency_using_chars_with_single_group(sequences)
sequences.each do |sequence|
grouped = sequence.chars.group_by{|x| x}
100000.times do
grouped.map{|letter, array| [letter, array.count]}
end
end
end
def count_frequency_using_count(sequence)
sequences.each do |sequence|
100000.times do
["a", "c", "g", "t"].map{|letter| sequence.count(letter)}
end
end
end
Benchmark.bm do |benchmark|
benchmark.report do
pp count_frequency_using_chars_with_single_group(arr_seq)
end
benchmark.report do
pp count_frequency_using_count(arr_seq)
end
end
For processing 100000 times, 10 sequences each with 70000 length
user system total real
3.710000 0.040000 3.750000 ( 23.452889) #modified group_by approach
1039.180000 6.920000 1046.100000 (1071.374730) #simple char count approach
Your simple char count approach is 47% slower than the modified group_by approach for the high volume strings. I ran the above benchmark for just 10 sequences each with 70000 length. Assume this for 100 or 1000 sequences, simple count would never an option. right?
回答2:
The thing that's slow is the group_by. Effectively, while you need to do 4 passes for the count method, the group_by method is a lot slower because it's doing a lot of work in order to do that group_by.
Breaking the code out a bit to have a method which does only the group by:
def create_genome
"gattaca" * 100
end
def count_frequency_using_chars(sequence)
100000.times do
sequence.chars.group_by{|x| x}.map{|letter, array| [letter, array.count]}
end
end
def count_frequency_using_count(sequence)
100000.times do
["a", "c", "g", "t"].map{|letter| sequence.count(letter)}
end
end
def just_group_by(sequence)
100000.times do
sequence.chars.group_by{|x| x}
end
end
sequence = create_genome
...
ruby-1.9.2-p180 :068 > puts Time.now()
2011-06-17 11:17:36 -0400
ruby-1.9.2-p180 :069 > count_frequency_using_chars(sequence)
=> 100000
ruby-1.9.2-p180 :070 > puts Time.now()
2011-06-17 11:18:07 -0400
ruby-1.9.2-p180 :071 > count_frequency_using_count(sequence)
=> 100000
ruby-1.9.2-p180 :072 > puts Time.now()
2011-06-17 11:18:08 -0400
ruby-1.9.2-p180 :073 > just_group_by(sequence)
=> 100000
ruby-1.9.2-p180 :074 > puts Time.now()
2011-06-17 11:18:37 -0400
You can see
- using group_by took 31 seconds
- using count took 1 second
- just doing the group_by took 29 seconds
While using the group_by to get the information you need is good conceptually, it's doing extra work that you don't need done.
回答3:
A faster slower way would be to pass the array only once.
hash = sequence.chars.inject(Hash.new(0)) { |h, e| h[e] += 1 ; h }
=> {"g"=>100, "a"=>300, "t"=>200, "c"=>100}
but actually it's NOT faster
This was a total copy & paste fail.
I leave the answer anyway since it shows how you use Benchmark
from the standard library.
require 'pp'
require 'benchmark'
def count_frequency_using_inject_hash(sequence)
100000.times do
sequence.chars.inject(Hash.new(0)) { |h, e| h[e] += 1 ; h }
end
end
sequence = create_genome
Benchmark.bm do |benchmark|
benchmark.report do
pp count_frequency_using_chars(sequence)
end
benchmark.report do
pp count_frequency_using_count(sequence)
end
benchmark.report do
pp count_frequency_using_inject_hash(sequence)
end
end
user system total real
41.500000 0.000000 41.500000 ( 42.484375)
1.312000 0.000000 1.312000 ( 1.406250)
49.265000 0.000000 49.265000 ( 49.348633)
回答4:
You can see what's going on better by profiling the VM itself.
A block being yielded an excessive number of times is the culprit here. If you use perftools' profiler for the VM, using the instructions listed under "Profiling the Ruby VM and C extensions" at https://github.com/tmm1/perftools.rb (note: this is more or less vanilla perftools, not perftools.rb)
Removing _init from all stack traces.
Total: 3883 samples
1321 34.0% 34.0% 3883 100.0% rb_yield_0
273 7.0% 41.1% 274 7.1% _IO_str_pbackfail
191 4.9% 46.0% 191 4.9% __i686.get_pc_thunk.bx
171 4.4% 50.4% 171 4.4% _init
131 3.4% 53.7% 3880 99.9% rb_eval
122 3.1% 56.9% 347 8.9% st_lookup
105 2.7% 59.6% 423 10.9% new_dvar
93 2.4% 62.0% 326 8.4% rb_newobj
89 2.3% 64.3% 89 2.3% _setjmp
77 2.0% 66.3% 400 10.3% str_new
67 1.7% 68.0% 357 9.2% dvar_asgn_internal
63 1.6% 69.6% 204 5.3% malloc
62 1.6% 71.2% 3820 98.4% rb_str_each_char
58 1.5% 72.7% 187 4.8% rb_ary_store
55 1.4% 74.1% 55 1.4% rb_memcmp
55 1.4% 75.5% 3883 100.0% rb_yield
# rest snipped for brevity
As you can see, rb_yield_0
is accounting for over a third of the activity, so even if you could optimize everything else, you'd still be slower than if you were using String#count
.
You can also confirm this by doing a benchmark where you're just creating a block that doesn't do anything.
require 'pp'
require 'benchmark'
def create_genome
"gattaca" * 100
end
ZILLION = 100000
def mainly_execute_a_trivial_block(sequence)
ZILLION.times do
sequence.length.times() {}
end
end
def execute_an_empty_loop_instead(sequence)
ZILLION.times do
i = 0
max = sequence.length
until i == max
i += 1
end
end
end
sequence = create_genome
puts RUBY_VERSION
Benchmark.bm do |benchmark|
benchmark.report do
pp mainly_execute_a_trivial_block(sequence)
end
benchmark.report do
pp execute_an_empty_loop_instead(sequence)
end
end
gives
user system total real
5.700000 0.000000 5.700000 ( 5.727715) # mainly_execute_a_trivial_block
1.920000 0.000000 1.920000 ( 1.942096) # execute_an_empty_loop_instead