I need to split a string by commas, but I have a problem with this case:
TEXT EXAMPLE (THIS IS (A EXAMPLE, BUT NOT WORKS, FOR ME)), SECOND , THIRD
I would like to split and get:
var[0] = "TEXT EXAMPLE (THIS IS (A EXAMPLE, BUT NOT WORKS, FOR ME))"
var[1] = "SECOND"
var[2] = "THIRD"
Thank you
You can use this negative lookahead based regex:
,(?!(?:[^(]*\([^)]*\))*[^()]*\))
This regex is finding a comma with an assertion that makes sure comma is not in parentheses. This is done using a negative lookahead that first consumes all matching (
and )
and then a )
. This assumes parentheses are balanced and unescaped.
RegEx Demo
Code:
>>> s = 'TEXT EXAMPLE (THIS IS (A EXAMPLE, BUT NOT WORKS, FOR ME)), SECOND , THIRD'
print re.split(r',(?!(?:[^(]*\([^)]*\))*[^()]*\))', s)
['TEXT EXAMPLE (THIS IS (A EXAMPLE, BUT NOT WORKS, FOR ME))', ' SECOND ', ' THIRD']
Or:
>>> s = 'TEXT EXAMPLE (THIS, IS (A EXAMPLE, BUT NOT WORKS, FOR ME)), SECOND , THIRD'
>>> print re.split(r',(?!(?:[^(]*\([^)]*\))*[^()]*\))', s)
['TEXT EXAMPLE (THIS, IS (A EXAMPLE, BUT NOT WORKS, FOR ME))', ' SECOND ', ' THIRD']
Here's a very simple parser approach that works for your example:
def top_level_split(s):
"""
Split `s` by top-level commas only. Commas within parentheses are ignored.
"""
# Parse the string tracking whether the current character is within
# parentheses.
balance = 0
parts = []
part = ''
for c in s:
part += c
if c == '(':
balance += 1
elif c == ')':
balance -= 1
elif c == ',' and balance == 0:
parts.append(part[:-1].strip())
part = ''
# Capture last part
if len(part):
parts.append(part.strip())
return parts
my_list = top_level_split("TEXT EXAMPLE (THIS IS (A EXAMPLE, BUT NOT WORKS, FOR ME)), SECOND , THIRD")
print(my_list)
Thanks to jonrsharpe :
text = "TEXT EXAMPLE (THIS IS (A EXAMPLE, BUT NOT WORKS, FOR ME)), SECOND , THIRD"
array = re.split(r',(?!.*\))', text)
for item in array:
# Print and remove the first space
print item.strip(" ")
Result:
TEXT EXAMPLE (THIS IS (A EXAMPLE, BUT NOT WORKS, FOR ME))
SECOND
THIRD
You can just use rsplit
:
l1 = "TEXT EXAMPLE (THIS IS (A EXAMPLE, BUT NOT WORKS, FOR ME)), SECOND , THIRD".rsplit(",", 2)
for line in l1:
print line
TEXT EXAMPLE (THIS IS (A EXAMPLE, BUT NOT WORKS, FOR ME))
SECOND
THIRD