I wanted to create this array
["studies", "theory", "form", "animal", "basic", "processes"]
from the following nested data structure (stored as sorted_hash
):
[["studies", {:freq=>11, :cap_freq=>0, :value=>11}],
["theory", {:freq=>9, :cap_freq=>1, :value=>11}],
["form", {:freq=>9, :cap_freq=>1, :value=>11}],
["animal", {:freq=>12, :cap_freq=>0, :value=>12}],
["basic", {:freq=>10, :cap_freq=>1, :value=>12}],
["processes", {:freq=>13, :cap_freq=>0, :value=>13}]]
I confused this to be a hash and wrote the following code to achieve my task:
sorted_hash.each do |key,value|
array.push key
end
And I really got what I wanted. But after some thinking and playing around in Pry I wonder why. The each
method Ruby Doc for Arrays only shows examples with one item variable, as in
each { |item| block } → ary
but I use two variables as I would do for Hashes. Will Ruby try to match the given item variables, which in this case succeeds as the 2nd level array has a length of 2?
Is it recommendable to do it like that?
Are there more idiomatic ways to do it?
The answer follows from the way "parallel assignment" is implemented in Ruby.
As you probably know:
a,b,c = 1,2,3
a #=> 1
b #=> 2
c #=> 3
a,b,c = [1,2,3]
a #=> 1
b #=> 2
c #=> 3
a,b = [1,2,3]
a #=> 1
b #=> 2
a,*b = [1,2,3]
a #=> 1
b #=> [2, 3]
*a,b = [1,2,3]
a #=> [1, 2]
b #=> 3
a,(b,c) = [1,[2,3]]
a #=> 1
b #=> 2
c #=> 3
a,(b,(c,d)) = [1,[2,[3,4]]]
a #=> 1
b #=> 2
c #=> 3
d #=> 4
The last two examples employ "disambiguation", which some people prefer to call "decomposition".
Now let's see how that applies to the assignment of values to block variables.
Suppose:
arr = [["studies", {:freq=>11, :cap_freq=>0, :value=>11}],
["theory", {:freq=>9, :cap_freq=>1, :value=>11}]]
and we execute:
arr.each { |a| p a }
["studies", {:freq=>11, :cap_freq=>0, :value=>11}]
["theory", {:freq=>9, :cap_freq=>1, :value=>11}]
Let's look at this more carefully. Define:
enum = arr.each
#=> #<Enumerator: [["studies", {:freq=>11, :cap_freq=>0, :value=>11}],
# ["theory", {:freq=>9, :cap_freq=>1, :value=>11}]]:each>
The first element is passed to the block and assigned to the block variable v
:
v = enum.next
#=> ["studies", {:freq=>11, :cap_freq=>0, :value=>11}]
We may be prefer to use parallel assignment with two block variables (after enum.rewind
to reset the enumerator):
a,h = enum.next
a #=> "studies"
h #=> {:freq=>11, :cap_freq=>0, :value=>11}
That allows us to write (for example):
arr.each { |a,h| p h }
{:freq=>11, :cap_freq=>0, :value=>11}
{:freq=>9, :cap_freq=>1, :value=>11}
Here we do not use the block variable a
. When that is the case, we might replace it with the local variable _
or possibly _a
:
arr.each { |_,h| p h }
arr.each { |_a,h| p h }
This draws attention to the fact that a
is not used and may help to avoid errors. Regarding errors, suppose we want:
[[1,2],[3,4]].map { |a,b| puts 1+b }
#=> [3,5]
but inadvertently write:
[[1,2],[3,4]].map { |a,b| puts a+b }
#=> [3,7]
which executes just fine (but produces an incorrect result). By contrast,
[[1,2],[3,4]].map { |_,b| puts a+b }
#NameError: undefined local variable or method 'a'
tells us there's a problem.
Here's a more elaborate example of what you can do in blocks with parallel assignment and disambiguation. Given:
h = { :a=>[1,2], :b=>[3,4] }
suppose we wish to obtain:
{ :a=>3, :b=>7 }
One way is the following:
h.each_with_object({}) { |(a,(b,c)),g| g[a] = b+c }
=> {:a=>3, :b=>7}
That's because Ruby conveniently lets you do this:
[[1,2,3], [4,5,6]].each {|x,y,z| puts "#{x}#{y}#{z}"}
# 123
# 456
So basically, each
yields an array element to the block, and because Ruby block syntax allows "expanding" array elements to their components by providing a list of arguments, it works.
You can find more tricks with block arguments here.
And by the way, instead of creating an array yourself and calling push
, you can simply do the following, since map
returns an array:
sorted_hash.map(&:first)