I need to split a string like this, on semicolons. But I don't want to split on semicolons that are inside of a string (' or "). I'm not parsing a file; just a simple string with no line breaks.
part 1;"this is ; part 2;";'this is ; part 3';part 4;this "is ; part" 5
Result should be:
- part 1
- "this is ; part 2;"
- 'this is ; part 3'
- part 4
- this "is ; part" 5
I suppose this can be done with a regex but if not; I'm open to another approach.
Here is an annotated pyparsing approach:
giving
By using pyparsing's provided
quotedString
, you also get support for escaped quotes.You also were unclear how to handle leading whitespace before or after a semicolon delimiter, and none of your fields in your sample text has any. Pyparsing would parse "a; b ; c" as:
Each time it finds a semicolon, the lookahead scans the entire remaining string, making sure there's an even number of single-quotes and an even number of double-quotes. (Single-quotes inside double-quoted fields, or vice-versa, are ignored.) If the lookahead succeeds, the semicolon is a delimiter.
Unlike Duncan's solution, which matches the fields rather than the delimiters, this one has no problem with empty fields. (Not even the last one: unlike many other
split
implementations, Python's does not automatically discard trailing empty fields.)we can create a function of its own
This seemed to me an semi-elegant solution.
New Solution:
Old solution:
I choose to match if there was an opening quote and wait it to close, and the match an ending semicolon. each "part" you want to match needs to end in semicolon. so this match things like this :
Code:
you may have to do some postprocessing to res, but it contains what you want.
Even though I'm certain there is a clean regex solution (so far I like @noiflection's answer), here is a quick-and-dirty non-regex answer.
(I've never put together something of this sort, feel free to critique my form!)