Python: How to match nested parentheses with regex

2019-01-01 07:40发布

I'm trying to match a mathematical-expression-like string, that have nested parentheses.

import re

p = re.compile('\(.+\)')
str = '(((1+0)+1)+1)'
print p.findall(s)

['(((1+0)+1)+1)']

I wanted it to match all the enclosed expressions, such as (1+0), ((1+0)+1)...
I don't even care if it matches unwanted ones like (((1+0), I can take care of those.

Why it's not doing that already, and how can I do it?

13条回答
查无此人
2楼-- · 2019-01-01 07:57

Stack is the best tool for the job: -

import re
def matches(line, opendelim='(', closedelim=')'):
    stack = []

    for m in re.finditer(r'[{}{}]'.format(opendelim, closedelim), line):
        pos = m.start()

        if line[pos-1] == '\\':
            # skip escape sequence
            continue

        c = line[pos]

        if c == opendelim:
            stack.append(pos+1)

        elif c == closedelim:
            if len(stack) > 0:
                prevpos = stack.pop()
                # print("matched", prevpos, pos, line[prevpos:pos])
                yield (prevpos, pos, len(stack))
            else:
                # error
                print("encountered extraneous closing quote at pos {}: '{}'".format(pos, line[pos:] ))
                pass

    if len(stack) > 0:
        for pos in stack:
            print("expecting closing quote to match open quote starting at: '{}'"
                  .format(line[pos-1:]))

In the client code, since the function is written as a generator function simply use the for loop pattern to unroll the matches: -

line = '(((1+0)+1)+1)'
for openpos, closepos, level in matches(line):
    print(line[openpos:closepos], level)

This test code produces following on my screen, noticed the second param in the printout indicates the depth of the parenthesis.

1+0 2
(1+0)+1 1
((1+0)+1)+1 0
查看更多
何处买醉
3楼-- · 2019-01-01 08:03

I believe this function may suit your need, I threw this together fast so feel free to clean it up a bit. When doing nests its easy to think of it backwards and work from there =]

def fn(string,endparens=False):
    exp = []
    idx = -1
    for char in string:
        if char == "(":
            idx += 1
            exp.append("")
        elif char == ")":
            idx -= 1
            if idx != -1:
                exp[idx] = "(" + exp[idx+1] + ")"
        else:
            exp[idx] += char
    if endparens:
        exp = ["("+val+")" for val in exp]
    return exp
查看更多
姐姐魅力值爆表
4楼-- · 2019-01-01 08:04

From a linked answer:

From the LilyPond convert-ly utility (and written/copyrighted by myself, so I can show it off here):

def paren_matcher (n):
    # poor man's matched paren scanning, gives up
    # after n+1 levels.  Matches any string with balanced
    # parens inside; add the outer parens yourself if needed.
    # Nongreedy.
    return r"[^()]*?(?:\("*n+r"[^()]*?"+r"\)[^()]*?)*?"*n

convert-ly tends to use this as paren_matcher (25) in its regular expressions which is likely overkill for most applications. But then it uses it for matching Scheme expressions.

Yes, it breaks down after the given limit, but the ability to just plug it into regular expressions still beats the "correct" alternatives supporting unlimited depth hands-down in usability.

查看更多
闭嘴吧你
5楼-- · 2019-01-01 08:09

Here is a demo for your question, though it is clumsy, while it works

import re s = '(((1+0)+1)+1)'

def getContectWithinBraces( x , *args , **kwargs):
    ptn = r'[%(left)s]([^%(left)s%(right)s]*)[%(right)s]' %kwargs
    Res = []
    res = re.findall(ptn , x)
    while res != []:
        Res = Res + res
        xx = x.replace('(%s)' %Res[-1] , '%s')
        res = re.findall(ptn, xx)
        print(res)
        if res != []:
            res[0] = res[0] %('(%s)' %Res[-1])
    return Res

getContectWithinBraces(s , left='\(\[\{' , right = '\)\]\}')
查看更多
与风俱净
6楼-- · 2019-01-01 08:10

The regular expression tries to match as much of the text as possible, thereby consuming all of your string. It doesn't look for additional matches of the regular expression on parts of that string. That's why you only get one answer.

The solution is to not use regular expressions. If you are actually trying to parse math expressions, use a real parsing solutions. If you really just want to capture the pieces within parenthesis, just loop over the characters counting when you see ( and ) and increment a decrement a counter.

查看更多
只靠听说
7楼-- · 2019-01-01 08:10

You should write a proper parser for parsing such expression (e.g. using pyparsing). Regular expressions are not an appropriate tool for writing decent parsers.

查看更多
登录 后发表回答