Python - Split by comma skipping the content insid

2020-07-29 02:00发布

问题:

I need to split a string by commas, but I have a problem with this case:

TEXT EXAMPLE (THIS IS (A EXAMPLE, BUT NOT WORKS, FOR ME)), SECOND , THIRD

I would like to split and get:

var[0] = "TEXT EXAMPLE (THIS IS (A EXAMPLE, BUT NOT WORKS, FOR ME))"
var[1] = "SECOND"
var[2] = "THIRD"

Thank you

回答1:

You can use this negative lookahead based regex:

,(?!(?:[^(]*\([^)]*\))*[^()]*\))

This regex is finding a comma with an assertion that makes sure comma is not in parentheses. This is done using a negative lookahead that first consumes all matching ( and ) and then a ). This assumes parentheses are balanced and unescaped.

RegEx Demo

Code:

>>> s = 'TEXT EXAMPLE (THIS IS (A EXAMPLE, BUT NOT WORKS, FOR ME)), SECOND , THIRD'
print re.split(r',(?!(?:[^(]*\([^)]*\))*[^()]*\))', s)

['TEXT EXAMPLE (THIS IS (A EXAMPLE, BUT NOT WORKS, FOR ME))', ' SECOND ', ' THIRD']

Or:

>>> s = 'TEXT EXAMPLE (THIS, IS (A EXAMPLE, BUT NOT WORKS, FOR ME)), SECOND , THIRD'
>>> print re.split(r',(?!(?:[^(]*\([^)]*\))*[^()]*\))', s)
['TEXT EXAMPLE (THIS, IS (A EXAMPLE, BUT NOT WORKS, FOR ME))', ' SECOND ', ' THIRD']


回答2:

Here's a very simple parser approach that works for your example:

def top_level_split(s):
    """
    Split `s` by top-level commas only. Commas within parentheses are ignored.
    """

    # Parse the string tracking whether the current character is within
    # parentheses.
    balance = 0
    parts = []
    part = ''

    for c in s:
        part += c
        if c == '(':
            balance += 1
        elif c == ')':
            balance -= 1
        elif c == ',' and balance == 0:
            parts.append(part[:-1].strip())
            part = ''

    # Capture last part
    if len(part):
        parts.append(part.strip())

    return parts

my_list = top_level_split("TEXT EXAMPLE (THIS IS (A EXAMPLE, BUT NOT WORKS, FOR ME)), SECOND , THIRD")
print(my_list)


回答3:

Thanks to jonrsharpe :

text = "TEXT EXAMPLE (THIS IS (A EXAMPLE, BUT NOT WORKS, FOR ME)), SECOND , THIRD"
array = re.split(r',(?!.*\))', text)
for item in array:
    # Print and remove the first space
    print item.strip(" ")

Result:

TEXT EXAMPLE (THIS IS (A EXAMPLE, BUT NOT WORKS, FOR ME))
SECOND
THIRD


回答4:

You can just use rsplit:

l1 = "TEXT EXAMPLE (THIS IS (A EXAMPLE, BUT NOT WORKS, FOR ME)), SECOND , THIRD".rsplit(",", 2)

for line in l1:
   print line

TEXT EXAMPLE (THIS IS (A EXAMPLE, BUT NOT WORKS, FOR ME))
SECOND
THIRD