The following example in JSON format contains one backslash, and if I run JSON.load
, the backslash disappears:
JSON.load('{ "88694": { "regex": ".*?\. (CVE-2015-46055)" } }')
# => {"88694"=>{ "regex"=>".*?. (CVE-2015-46055)"}}
How can I keep the backslash?
My goal is to have this structure, and whenever I need, read the file, load the JSON into Hash, and search for those regular expressions.
UPDATE 1
here is an example what I want.
irb> "stack.overflow"[/.*?\./]
=> "stack."
I can't pass the regex from JSON to my string in order to catch that ".", because the "\." disappears.
str = '{ "88694": { "regex": ".*?\. (CVE-2015-46055)" } }'
#=> "{ \"88694\": { \"regex\": \".*?\\. (CVE-2015-46055)\" } }"
str.chars
#=> ["{", " ", "\"", "8", "8", "6", "9", "4", "\"", ":", " ", "{", " ",
# "\"", "r", "e", "g", "e", "x", "\"", ":", " ", "\"", ".", "*", "?",
# "\\", ".",
# ~~~ ~~
# " ", "(",..., "}", " ", "}"]
This shows us that str
does indeed contain a backslash character followed by a period. The reason is that str
is enclosed in single quotes. \.
would only be treated as an escaped period if str
were enclosed in double quotes:
"{ '88694': { 'regex': '.*?\. (CVE-2015-46055)' } }".chars[25,3]
#=> ["?", ".", " "]
The return value of str
converts the single-quoted string to a double-quoted string:
"{ \"88694\": { \"regex\": \".*?\\. (CVE-2015-46055)\" } }"
\\
is one backslash character followed by a period. With the double quotes the period can now be escaped, but it is not preceded by a backslash, only by a backspace character.
Now let's add another backslash and see what happens:
str1 = '{ "88694": { "regex": ".*?\\. (CVE-2015-46055)" } }'
str1.chars == str.chars
#=> true
The result is the same. That is because single quotes support the escape sequence \\
(single backslash) (and only one other: \'
[single quote]).
Now let's add a third backslash:
str2 = '{ "88694": { "regex": ".*?\\\. (CVE-2015-46055)" } }'
str2.chars
#=> ["{", " ", "\"", "8", "8", "6", "9", "4", "\"", ":", " ", "{", " ",
# "\"", "r", "e", "g", "e", "x", "\"", ":", " ", "\"", ".", "*", "?",
# "\\", "\\", ".",
# ~~~~ ~~~~ ~~~
# " ", "(",..., "}", " ", "}"]
Surprised? \\
produces one backslash character (escaped backslash in single quotes), \
products a second backslash character (backslash in single quotes) and .
is a period in single quotes.
We obtain:
s = {"88694"=>{"regex"=>".*?\\. (CVE-2015-46055)"}.to_json
JSON.parse(str)
#=> {"88694"=>{"regex"=>".*?. (CVE-2015-46055)"}}
JSON.parse(str1)
#=> {"88694"=>{"regex"=>".*?. (CVE-2015-46055)"}}
JSON.parse(str2)
#=> {"88694"=>{"regex"=>".*?\\. (CVE-2015-46055)"}}
str2
is what we want, as
JSON.parse(str2)["88694"]["regex"].chars[2,4]
#=> ["?", "\\", ".", " "]
We could alternatively work backwards:
js = {"88694"=>{"regex"=>".*?\\. (CVE-2015-46055)"}}.to_json
#=> "{\"88694\":{\"regex\":\".*?\\\\. (CVE-2015-46055)\"}}"
'{"88694":{"regex":".*?\\\. (CVE-2015-46055)"}}' == js
#=> true
This string is the same as str2
after all spaces outside of quoted substrings have been removed.
It appears that JSON treats two successive backslash characters as one backslash character. See @Jordan's comment.
Perhaps a reader can elaborate what JSON is doing here.