I'm writing a regular expression in Objective-C.
The escape sequence \w
is illegal and emits a warning, so the regular expression /\w/
must be written as @"\\w"
; the escape sequence \?
is valid, apparently, and doesn't emit a warning, so the regular expression /\?/
must be written as @"\\?"
(i.e., the backslash must be escaped).
Question marks aren't invisible like \t
or \n
, so why is \?
a valid escape sequence?
Edit: To clarify, I'm not asking about the quantifier, I'm asking about a string escape sequence. That is, this doesn't emit a warning:
NSString *valid = @"\?";
By contrast, this does emit a warning ("Unknown escape sequence '\w'"):
NSString *invalid = @"\w";
It specifies a literal question mark. It is needed because of a little-known feature called trigraphs, where you can write a three-character sequence starting with question marks to substitute another character. If you have trigraphs enabled, in order to write "??" in a string, you need to write it as "?\?"
in order to prevent the preprocessor from trying to read it as the beginning of a trigraph.
(If you're wondering "Why would anybody introduce a feature like this?": Some keyboards or character sets didn't include commonly used symbols like {
. so they introduced trigraphs so you could write ??<
instead.)
?
in regex is a quantifier, it means 0 or 1 occurences. When appended to the +
or *
quantifiers, it makes it "lazy".
For example, applying the regex o?
to the string foo?
would match o
.
However, the regex o\?
in foo?
would match o?
, because it is searching for a literal question mark in the string, instead of an arbitrary quantifier.
Applying the regex o*?
to foo?
would match oo
.
More info on quantifiers here.