I need to split a string like this, on semicolons. But I don't want to split on semicolons that are inside of a string (' or "). I'm not parsing a file; just a simple string with no line breaks.
part 1;"this is ; part 2;";'this is ; part 3';part 4;this "is ; part" 5
Result should be:
- part 1
- "this is ; part 2;"
- 'this is ; part 3'
- part 4
- this "is ; part" 5
I suppose this can be done with a regex but if not; I'm open to another approach.
Instead of splitting on a separator pattern, just capture whatever you need:
Although the topic is old and previous answers are working well, I propose my own implementation of the split function in python.
This works fine if you don't need to process large number of strings and is easily customizable.
Here's my function:
So you can run:
result:
The advantage is that this function works with empty fields and with any number of separators in the string.
Hope this helps!
My approach is to replace all non-quoted occurrences of the semi-colon with another character which will never appear in the text, then split on that character. The following code uses the re.sub function with a function argument to search and replace all occurrences of a
srch
string, not enclosed in single or double quotes or parens, brackets or braces, with arepl
string:If you don't care about the bracketed characters, you can simplify this code a lot.
Say you wanted to use a pipe or vertical bar as the substitute character, you would do:
BTW, this uses
nonlocal
from Python 3.1, change it to global if you need to.You appears to have a semi-colon seperated string. Why not use the
csv
module to do all the hard work?Off the top of my head, this should work
This should give you something like
("part 1", "this is ; part 2;", 'this is ; part 3', "part 4", "this \"is ; part\" 5")
Edit:
Unfortunately, this doesn't quite work, (even if you do use StringIO, as I intended), due to the mixed string quotes (both single and double). What you actually get is
['part 1', 'this is ; part 2;', "'this is ", " part 3'", 'part 4', 'this "is ', ' part" 5']
.If you can change the data to only contain single or double quotes at the appropriate places, it should work fine, but that sort of negates the question a bit.