When I have a string like this:
s1 = 'stuff(remove_me)'
I can easily remove the parentheses and the text within using
# returns 'stuff'
res1 = re.sub(r'\([^)]*\)', '', s1)
as explained here.
But I sometimes encounter nested expressions like this:
s2 = 'stuff(remove(me))'
When I run the command from above, I end up with
'stuff)'
I also tried:
re.sub('\(.*?\)', '', s2)
which gives me the same output.
How can I remove everything within the outer parentheses - including the parentheses themselves - so that I also end up with 'stuff'
(which should work for arbitrarily complex expressions)?
https://regex101.com/r/kQ2jS3/1
This captures the
furthest
parentheses, and everything in between the parentheses.Your old regex captures the first parentheses, and everything between to the
next
parentheses.If you are sure that the parentheses are initially balanced, just use the greedy version:
re
matches are eager so they try to match as much text as possible, for the simple test case you mention just let the regex run:NOTE:
\(.*\)
matches the first(
from the left, then matches any 0+ characters (other than a newline if a DOTALL modifier is not enabled) up to the last)
, and does not account for properly nested parentheses.To remove nested parentheses correctly with a regular expression in Python, you may use a simple
\([^()]*\)
(matching a(
, then 0+ chars other than(
and)
and then a)
) in a while block usingre.subn
:Bascially: remove the
(...)
with no(
and)
inside until no match is found. Usage:A non-regex way is also possible:
See another Python demo
As mentioned before, you'd need a recursive regex for matching arbitrary levels of nesting but if you know there can only be a maximum of one level of nesting have a try with this pattern:
[^)(]
matches a character, that is not a parenthesis (negated class).|\([^)(]*\)
or it matches another(
)
pair with any amount of non)(
inside.(?:
...)*
all this any amount of times inside(
)
Here is a demo at regex101
Before the alternation used
[^)(]
without+
quantifier to fail faster if unbalanced.You need to add more levels of nesting that might occure. Eg for max 2 levels:
Another demo at regex101