grep
can't be fed "raw" strings when used from the command-line, since some characters need to be escaped to not be treated as literals. For example:
$ grep '(hello|bye)' # WON'T MATCH 'hello'
$ grep '\(hello\|bye\)' # GOOD, BUT QUICKLY BECOMES UNREADABLE
I was using printf
to auto-escape strings:
$ printf '%q' '(some|group)\n'
\(some\|group\)\\n
This produces a bash-escaped version of the string, and using backticks, this can easily be passed to a grep call:
$ grep `printf '%q' '(a|b|c)'`
However, it's clearly not meant for this: some characters in the output are not escaped, and some are unnecessarily so. For example:
$ printf '%q' '(^#)'
\(\^#\)
The ^
character should not be escaped when passed to grep
.
Is there a cli tool that takes a raw string and returns a bash-escaped version of the string that can be directly used as pattern with grep? How can I achieve this in pure bash, if not?
If you want to search for an exact string,
-F
tellsgrep
to treat the pattern as is, with no interpretation as a regex.(This is often available as
fgrep
as well.)If you are attempting to get
grep
to use Extended Regular Expression syntax, the way to do that is to usegrep -E
(akaegrep
). You should also know aboutgrep -F
(akafgrep
) and, in newer versions of GNU Coreutils,grep -P
.Background: The original
grep
had a fairly small set of regex operators; it was Ken Thompson's original regular expression implementation. A new version with an extended repertoire was developed later, and for compatibility reasons, got a different name. With GNUgrep
, there is only one binary, which understands the traditional, basic RE syntax if invoked asgrep
, and ERE if invoked asegrep
. Some constructs fromegrep
are available ingrep
by using a backslash escape to introduce special meaning.Subsequently, the Perl programming language has extended the formalism even further; this regex dialect seems to be what most newcomers erroneously expect
grep
, too, to support. Withgrep -P
, it does; but this is not yet widely supported on all platforms.So, in
grep
, the following characters have a special meaning:^$[]*.\
In
egrep
, the following characters also have a special meaning:()|+?{}
. (The braces for repetition were not in the originalegrep
.) The grouping parentheses also enable backreferences with\1
,\2
, etc.In many versions of
grep
, you can get theegrep
behavior by putting a backslash before theegrep
specials. There are also special sequences like\<\>
.In Perl, a huge number of additional escapes like
\w
\s
\d
were introduced. In Perl 5, the regex facility was substantially extended, with non-greedy matching*?
+?
etc, non-grouping parentheses(?:...)
, lookaheads, lookbehinds, etc.... Having said that, if you really do want to convert
egrep
regular expressions togrep
regular expressions without invoking any external process, try${regex/pattern/substitution}
for each of theegrep
special characters; but recognize that this does not handle character classes, negated character classes, or backslash escapes correctly.When I use grep -E with user provided strings I escape them with this
example run
This way you may safely insert the quoted string in your regular expression.
e.g. if you wanted to find each line starting with the user content, with the user providing funny strings as .*
I think that previous answers are not complete because they miss one important thing, namely string which begin with dash (-). So while this won't work:
This one will: