可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

In the following code, k2 is minimally different from k1. That is, k2 is exactly the same except that it's defined using an interpolation. (That is, I expected it to be exactly the same; Obviously from the result of p k2 it is not.)

v  = /[aeiouAEIOUäöüÄÖÜ]/                 # vowels
k1 = /[[ßb-zB-Z]&&[^[aeiouAEIOUäöüÄÖÜ]]]/ # consonants defined without interpolation
k2 = /[[ßb-zB-Z]&&[^#{v}]]/               # consonants defined same way, but with interpolation

But as below, using gsub with k1 works, whereas using it with k2 fails in a way I don't understand.

all_chars = "äöüÄÖÜß"<<('a'..'z').to_a.join<<('A'..'Z').to_a.join

p all_chars                  # "äöüÄÖÜßabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
p all_chars.gsub( k1 , '_' ) # "äöüÄÖÜ_a___e___i_____o_____u_____A___E___I_____O_____U_____"
p all_chars.gsub( k2 , '_' ) # "äöüÄÖÜ_abcdefghijklm_o_____u__x__ABCDEFGHIJKLMNOPQRSTUVWXYZ"
p k1                         # /[[ßb-zB-Z]&&[^[aeiouAEIOUäöüÄÖÜ]]]/
p k2                         # /[[ßb-zB-Z]&&[^(?-mix:[aeiouAEIOUäöüÄÖÜ])]]/

Why doesn't it work? What is (?-mix:...)? Is there a way to make this work the way I was expecting it to?

回答1:

I do things like:

keywords = %w[foo bar]
regex = /\b(?:#{ Regexp.union(keywords).source })\b/i
# => /\b(?:foo|bar)\b/i

That's useful when you want to test for the occurrence of multiple sub-strings inside a single string at once.

Interpolating a regex into a string won't necessarily work right. By default, when you do that, Ruby converts the pattern using to_s, which is not what I want, because I don't want the full string representation of the pattern, flags and all. Using source returns what I want:

regex = Regexp.union(keywords)
regex         # => /foo|bar/
regex.inspect # => "/foo|bar/"
regex.to_s    # => "(?-mix:foo|bar)"
regex.source  # => "foo|bar"

回答2:

Use a string to hold those characters and interpolate that into regexes as needed. Ruby is trying to cover some bases with (?mix:) but it isn't anticipating that the regex is going into a character set inside the other regex.

Background Info

Here's what's really happening:

In many cases, if you interpolate a regex into a regex, it makes sense. Like this

a = /abc/       #/abc/
b = /#{a}#{a}/  #/(?-mix:abc)(?-mix:abc)/

'hhhhabcabchthth'.gsub(/abcabc/, '_')   # "hhhh_hthth"
'hhhhabcabchthth'.gsub(b, '_')          # "hhhh_hthth"

It works as expected. The whole (?-mix: thing is a way of encapsulating the rules for a, just in case b has different flags. a is case sensitive, because this is the default. But if b was set to case insensitive, the only way for a to continue matching what it matched before is to make sure it is case sensitive using -i. Anything inside (?-i:) after the colon will be matched with case sensitivity. This is made more clear by the following

e = /a/i # e is made to be case insensitive with the /i
/#{e}/   # /(?i-mx:a)/

You can see above that when interpolating e into something, you now have (?i-mx:). Now the i is to the left of the -, which means it turns case insensitivity on instead of off (temporarily), in order for e to match as it normally would.

Also, in order to avoid messing up the capture order, (?: is added in to make an uncaptured group. All of that is a rough attempt to make a and e variables match what you expect them to match when you stick them into a larger regex.

Unfortunately, if you put it inside a character set match, meaning [], this strategy completely fails. [(?-mix:)] is now interpreted completely differently. [^?-m] indicates everything that is NOT between "?" and "m" (inclusive), which means, for example, the letter "c" is no longer in your character set. Which means "c" doesn't get replaced with underscore as you see in your example. You can see the same thing happening with the letter "x". It also doesn't get replaced with a underscore, because it is within the negated character set, and therefore not in the characters being matched.

Ruby doesn't bother to parse the regular expression to figure out that you're interpolating your regular expression into a character set, and even if it did, it would still have to parse out the v variable to figure out that it is also a character set, and that therefore all you really want is to take the characters from the character set in v and put them with all the other characters there.

My advice is that since aeiouAEIOUäöüÄÖÜ is just a bunch of characters anyway, you can store it in a string and interpolate that into any character set in a regular expression. And be careful about interpolating a regex into a regex in the future. Avoid it unless you are really certain about what it's going to do.

回答3:

Answer I'm using:

If you want to interpolate some_regex into another one, use regex1.inspect[1...-1] inside the #{}.

Eg, taking my original example, this way of defining consonants using an interpolation works.

v  = /[aeiouAEIOUäöüÄÖÜ]/                   # vowels
k3 = /[[ßb-zB-Z]&&[^#{v.inspect[1...-1]}]]/ # consonants

(I don't know if there's some sort of built-in way to accomplish the same function as .inspect[1...-1] for regexes.

I was surprised that that's not already how .to_s works for regexes.

I'm still not sure what "(?-mix:some_regex)" is for.)

回答4:

Your statement "k2 is exactly the same except that it's defined using an interpolation" is wrong.

When you interpolate something that is not a string, such as regex v, it is casted to a string with to_s.

v = /[aeiouAEIOUäöüÄÖÜ]/
v.to_s # => "(?-mix:[aeiouAEIOUäöüÄÖÜ])"

This is interpolated into k2, resulting in a different regex from k1. If you want k2 to be the same as k1, you need to interpolate a string:

v = "[aeiouAEIOUäöüÄÖÜ]"

Interpolating regexes into another regex

问题:

回答1:

回答2:

回答3:

回答4:

收藏的人(0)

Interpolating regexes into another regex

问题:

回答1:

回答2:

回答3:

回答4:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮