Python: How to match nested parentheses with regex-第2页回答

I'm trying to match a mathematical-expression-like string, that have nested parentheses.

import re

p = re.compile('\(.+\)')
str = '(((1+0)+1)+1)'
print p.findall(s)

['(((1+0)+1)+1)']

I wanted it to match all the enclosed expressions, such as (1+0), ((1+0)+1)...
I don't even care if it matches unwanted ones like (((1+0), I can take care of those.

Why it's not doing that already, and how can I do it?

标签： python regex nested

13条回答

查无此人

2楼-- · 2019-01-01 07:57

Stack is the best tool for the job: -

import re
def matches(line, opendelim='(', closedelim=')'):
    stack = []

    for m in re.finditer(r'[{}{}]'.format(opendelim, closedelim), line):
        pos = m.start()

        if line[pos-1] == '\\':
            # skip escape sequence
            continue

        c = line[pos]

        if c == opendelim:
            stack.append(pos+1)

        elif c == closedelim:
            if len(stack) > 0:
                prevpos = stack.pop()
                # print("matched", prevpos, pos, line[prevpos:pos])
                yield (prevpos, pos, len(stack))
            else:
                # error
                print("encountered extraneous closing quote at pos {}: '{}'".format(pos, line[pos:] ))
                pass

    if len(stack) > 0:
        for pos in stack:
            print("expecting closing quote to match open quote starting at: '{}'"
                  .format(line[pos-1:]))

In the client code, since the function is written as a generator function simply use the for loop pattern to unroll the matches: -

line = '(((1+0)+1)+1)'
for openpos, closepos, level in matches(line):
    print(line[openpos:closepos], level)

This test code produces following on my screen, noticed the second param in the printout indicates the depth of the parenthesis.

1+0 2
(1+0)+1 1
((1+0)+1)+1 0

0人赞添加讨论(0) 举报

何处买醉

3楼-- · 2019-01-01 08:03

I believe this function may suit your need, I threw this together fast so feel free to clean it up a bit. When doing nests its easy to think of it backwards and work from there =]

def fn(string,endparens=False):
    exp = []
    idx = -1
    for char in string:
        if char == "(":
            idx += 1
            exp.append("")
        elif char == ")":
            idx -= 1
            if idx != -1:
                exp[idx] = "(" + exp[idx+1] + ")"
        else:
            exp[idx] += char
    if endparens:
        exp = ["("+val+")" for val in exp]
    return exp

0人赞添加讨论(0) 举报

姐姐魅力值爆表

4楼-- · 2019-01-01 08:04

From a linked answer:

From the LilyPond convert-ly utility (and written/copyrighted by myself, so I can show it off here):

def paren_matcher (n):
    # poor man's matched paren scanning, gives up
    # after n+1 levels.  Matches any string with balanced
    # parens inside; add the outer parens yourself if needed.
    # Nongreedy.
    return r"[^()]*?(?:\("*n+r"[^()]*?"+r"\)[^()]*?)*?"*n

convert-ly tends to use this as paren_matcher (25) in its regular expressions which is likely overkill for most applications. But then it uses it for matching Scheme expressions.

Yes, it breaks down after the given limit, but the ability to just plug it into regular expressions still beats the "correct" alternatives supporting unlimited depth hands-down in usability.

0人赞添加讨论(0) 举报

闭嘴吧你

5楼-- · 2019-01-01 08:09

Here is a demo for your question, though it is clumsy, while it works

import re s = '(((1+0)+1)+1)'

def getContectWithinBraces( x , *args , **kwargs):
    ptn = r'[%(left)s]([^%(left)s%(right)s]*)[%(right)s]' %kwargs
    Res = []
    res = re.findall(ptn , x)
    while res != []:
        Res = Res + res
        xx = x.replace('(%s)' %Res[-1] , '%s')
        res = re.findall(ptn, xx)
        print(res)
        if res != []:
            res[0] = res[0] %('(%s)' %Res[-1])
    return Res

getContectWithinBraces(s , left='\(\[\{' , right = '\)\]\}')

0人赞添加讨论(0) 举报

与风俱净

6楼-- · 2019-01-01 08:10

The regular expression tries to match as much of the text as possible, thereby consuming all of your string. It doesn't look for additional matches of the regular expression on parts of that string. That's why you only get one answer.

The solution is to not use regular expressions. If you are actually trying to parse math expressions, use a real parsing solutions. If you really just want to capture the pieces within parenthesis, just loop over the characters counting when you see ( and ) and increment a decrement a counter.

0人赞添加讨论(0) 举报

只靠听说

7楼-- · 2019-01-01 08:10

You should write a proper parser for parsing such expression (e.g. using pyparsing). Regular expressions are not an appropriate tool for writing decent parsers.

0人赞添加讨论(0) 举报

Python: How to match nested parentheses with regex

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间