Is there any way to pass the last match (practically Regexp.last_match
) to a block (iterator) in Ruby?
Here is a sample method as a kind of wrapper of Srring#sub
to demonstrate the problem. It accepts both the standard arguments and a block:
def newsub(str, *rest, &bloc)
str.sub(*rest, &bloc)
end
It works in the standard arguments-only case and it can take a block; however the positional special variable like $1, $2, etc are not usable inside the block. Here are some examples:
newsub("abcd", /ab(c)/, '\1') # => "cd"
newsub("abcd", /ab(c)/){|m| $1} # => "d" ($1 == nil)
newsub("abcd", /ab(c)/){$1.upcase} # => NoMethodError
The reason the block does not work in the same way as String#sub(/..(.)/){$1}
is I suppose something to do with the scope; the special variables $1, $2 etc are local variables (so is Regexp.last_match
).
Is there any way to solve this? I would like to make the method newsub
work just as String#sub
does, in the sense $1, $2, etc are usable in the supplied block.
EDIT: According to some past answers, there may not be a way to achieve this…
Here is a way as per the question (Ruby 2). It is not pretty, and is not quite 100% perfect in all aspects, but does the job.
With this, the result is as follows:
In-depth analysis
In the above-defined method
newsub
, when a block is given, the local variables $1 etc in the caller's thread are (re)set, after the block is executed, which is consistent withString#sub
. However, when a block is not given, the local variables $1 etc are not reset, whereas inString#sub
, $1 etc are always reset regardless of whether a block is given or not.Also, the caller's local variable
_
is reset in this algorithm. In Ruby's convention, the local variable_
is used as a dummy variable and its value should not be read or referred to. Therefore, this should not cause any practical problems. If the statementlocal_variable_set(:$~, $~)
was valid, no temporary local variables would be needed. However, it is not, in Ruby (as of Version 2.5.1 at least). See a comment (in Japanese) by Kazuhiro NISHIYAMA in [ruby-list:50708].General background (Ruby's specification) explained
Here is a simple example to highlight Ruby's specification related to this issue:
The special variables of
$&
,$1
,$2
, etc, (related,$~
(Regexp.last_match
),$'
and alike) work in the local scope. In Ruby, a local scope inherits the variables of the same names in the parent scope. In the example above, the variables
is inherited, and so is$1
. Thedo
block is yield-ed by1.times
, and the method1.times
has no control over the variables inside the block except for the block parameters (i
in the example above; n.b., althoughInteger#times
does not provide any block parameters, to attempt to receive one(s) in a block would be silently ignored).This means a method that yield-s a block has no control over
$1
,$2
, etc in the block, which are local variables (even though they may look like global variables).Case of String#sub
Now, let us analyse how
String#sub
with the block works:Here, the method
sub
first performs a Regexp match, and hence the local variables like$1
are automatically set. Then, they (the variables like$1
) are inherited in the block, because this block is in the same scope as the method "sub". They are not passed fromsub
to the block, being different from the block parameterm
(which is a matched String, or equivalent to$&
).For that reason, if the method
sub
is defined in a different scope from the block, thesub
method has no control over local variables inside the block, including$1
. A different scope means the case where thesub
method is written and defined with a Ruby code, or in practice, all the Ruby methods except some of those written not in Ruby but in the same language as used to write the Ruby interpreter.Ruby's official document (Ver.2.5.1) explains in the section of
String#sub
:Correct. In practice, the methods that can and do set the Regexp-match-related special variables such as $1, $2, etc are limited to some built-in methods, including
Regexp#match
,Regexp#=~
,Regexp#===
,String#=~
,String#sub
,String#gsub
,String#scan
,Enumerable#all?
, andEnumerable#grep
.Tip 1:
String#split
seems to reset$~
nil always.Tip 2:
Regexp#match?
andString#match?
do not update$~
and hence are much faster.Here is a little code snippet to highlight how the scope works:
Here,
$1
in the method sample() is set bystr.sub
in the same scope. That implies the methodsample()
would not be able to (simply) refer to$1
in the block given to it.I point out the statement in the section of Regular expression of Ruby's official document (Ver.2.5.1)
is rather misleading, because
$~
is a pre-defined local-scope variable (not global variable), and$~
is set (maybe nil) regardless of whether the last attempted match is successful or not.The fact the variables like
$~
and$1
are not global variables may be slightly confusing. But hey, they are useful notations, aren't they?