In a regex replacement pattern, a backreference looks like \1
. If you want to include a digit after that backreference, this will fail because the digit is considered to be part of the backreference number:
# replace all twin digits by zeroes, but retain white space in between
re.sub(r"\d(\s*)\d", r"0\10", "0 1")
>>> sre_constants.error: invalid group reference
Substitution pattern r"0\1 0"
would work fine but in the failing example back-reference \1
is interpreted as \10
.
How can the digit '0'
be separated from the back-reference \1
that precedes it?
You can use
\g<1>
, as mentioned in the docs.Instead of using a backreference with a sequence number (
\1
), you can use named groups and the problem is solved:Turns out this trick is in fact described in the documentation of re.sub.