I need to match a regex that uses backreferences (e.g. \1) in my Go code.
That's not so easy because in Go, the official regexp package uses the RE2 engine, one that have chosen to not support backreferences (and some other lesser-known features) so that there can be a guarantee of linear-time execution, therefore avoiding regex denial-of-service attacks. Enabling backreferences support is not an option with RE2.
In my code, there is no risk of malicious exploitation by attackers, and I need backreferences.
What should I do?
Regular Expressions are great for working with regular grammars, but if your grammar isn't regular (i.e. requires back-references and stuff like that) you should probably switch to a better tool. There are a lot of good tools available for parsing context-free grammars, including yacc which is shipped with the Go distribution by default. Alternatively, you can also write your own parser. Recursive descent parsers can be easily written by hand for example.
I think regular expressions are overused in scripting languages (like Perl, Python, Ruby, ...) because their C/ASM powered implementation is usually more optimized than those languages itself, but Go isn't such a language. Regular expressions are usually quite slow and are often not suited for the problem at all.
Answering my own question here, I solved this using golang-pkg-pcre, it uses libpcre++, perl regexes that do support backreferences. The API is not the same.
When I had the same problem, I solved it using a two-step regular expression match. The original code is:
The code is supposed to parse a string of the form
${DISTNAME:S|from|to|g}
, which itself is a little pattern language using the familiar substitution syntaxS|replace|with|
.The two-stage code looks like this:
The
match
,match4
andmatch5
are my own wrapper around theregexp
package, and they cache the compiled regular expressions so that at least the compilation time is not wasted.