Backslashes in gsub (escaping and backreferencing)

2019-05-07 10:34发布

问题:

Consider the following snippet:

puts 'hello'.gsub(/.+/, '\0 \\0 \\\0 \\\\0')

This prints (as seen on ideone.com):

hello hello \0 \0

This was very surprising, because I'd expect to see something like this instead:

hello \0 \hello \\0

My argument is that \ is an escape character, so you write \\ to get a literal backslash, thus \\0 is a literal backslash \ followed by 0, etc. Obviously this is not how gsub is interpreting it, so can someone explain what's going on?

And what do I have to do to get the replacement I want above?

回答1:

Escaping is limited when using single quotes rather then double quotes:

puts 'sinlge\nquote'
puts "double\nquote"

"\0" is the null-character (used i.e. in C to determine the end of a string), where as '\0' is "\\0", therefore both 'hello'.gsub(/.+/, '\0') and 'hello'.gsub(/.+/, "\\0") return "hello", but 'hello'.gsub(/.+/, "\0") returns "\000". Now 'hello'.gsub(/.+/, '\\0') returning 'hello' is ruby trying to deal with programmers not keeping the difference between single and double quotes in mind. In fact, this has nothing to do with gsub: '\0' == "\\0" and '\\0' == "\\0". Following this logic, whatever you might think of it, this is how ruby sees the other strings: both '\\\0' and '\\\\0' equal "\\\\0", which (when printed) gives you \\0. As gsub uses \x for inserting match number x, you need a way to escape \x, which is \\x, or in its string representation: "\\\\x".

Therefore the line

puts 'hello'.gsub(/.+/, "\\0 \\\\0 \\\\\\0 \\\\\\\\0")

indeed results in

hello \0 \hello \\0