NOTE: I'm not parsing lots of or html or generic html with regex. I know that's bad
TL;DR:
I have strings like
A sentence with an exclamation\! Next is a \* character
Where there are "escaped" characters in the original markup. I wish to replace them with their "originals". And get:
A sentence with an exclamation! Next is a * character
I have a small bit data that I need to extract from some wiki markup.
I'm only dealing with paragraphs/snippets here, so I don't need a big robust solution. In python, I tried a test:
s = "test \\* \\! test * !! **"
r = re.compile("""\\.""") # Slash followed by anything
r.sub("-", s)
This SHOULD yeild:
test - - test * !! **
But it doesn't do anything. Am I missing something here?
Furthermore, I'm not sure how to go about replacing any given escaped character with its original, so I would probably just make a list and sub with specific regexes like:
\\\*
and
\\!
There's probably a much cleaner way to do this, so any help is greatly appreciated.
You are missing something, namely the
r
prefix:Both python and
re
attach meaning to\
; your doubled backslash becomes just one backslash when you pass the string value tore.compile()
, by which timere
sees\.
, meaning a literal full stop.:By using
r''
you tell python not to interpret escape codes, so nowre
is given a string with\\.
, meaning a literal backslash followed by any character:Demo:
The rule of thumb is: when defining regular expressions, use
r''
raw string literals, saving you to have to double-escape everything that has meaning to both Python and regular expression syntax.Next, you want to replace the 'escaped' character; use groups for that,
re.sub()
lets you reference groups as the replacement value:Now the output is: