What are the differences between regular expressio

2020-05-19 08:22发布

问题:

Different tools implement regular expressions differently. For example to match "foo" or "bar":

printf "%s\n" foo bar baz food | grep -o '\<\(fo\+\|bar\)\>'
printf "%s\n" foo bar baz food | awk '/\<(fo+|bar)\>/'
printf "%s\n" foo bar baz food | sed -n '/\<\(fo\+\|bar\)\>/p'
printf "%s\n" foo bar baz food | sed -nr '/\<(fo+|bar)\>/p'

Where are these differences documented?

回答1:

Score! I'm so happy to have found this page:
https://www.gnu.org/software/gnulib/manual/html_node/Regular-expression-syntaxes.html

14.8 Regular expression syntaxes

Gnulib supports many different types of regular expressions; although the underlying features are the same or identical, the syntax used varies. The descriptions given here for the different types are generated automatically.

  • awk regular expression syntax
  • egrep regular expression syntax
  • ed regular expression syntax
  • emacs regular expression syntax
  • gnu-awk regular expression syntax
  • grep regular expression syntax
  • posix-awk regular expression syntax
  • posix-basic regular expression syntax
  • posix-egrep regular expression syntax
  • posix-extended regular expression syntax
  • posix-minimal-basic regular expression syntax
  • sed regular expression syntax


回答2:

It may also be helpful to note that the only difference in the regex part is the difference between Basic Regular Expression (BRE) and Extended Regular Expressions (ERE).

BRE (+GNU)

printf "%s\n" foo bar baz food | grep '\<\(fo\+\|bar\)\>'
printf "%s\n" foo bar baz food | sed -n '/\<\(fo\+\|bar\)\>/p'

ERE (+GNU)

printf "%s\n" foo bar baz food | grep -E '\<(fo+|bar)\>'
printf "%s\n" foo bar baz food | sed -nr '/\<(fo+|bar)\>/p'
printf "%s\n" foo bar baz food | awk '/\<(fo+|bar)\>/'

I left out the -o with grep above.

It may be also good to note that all examples above are with GNU utilities with GNU extensions to POSIX regular expressions.

All examples are using the GNU extension :

\< ... \>

And in addition the BRE examples are using the GNU extension:

\+

Which will probably not work if used with other versions of these utilities..