This question already has answers here:
Closed 4 years ago.
Using python, I want to split the following string:
a=foo, b=bar, c="foo, bar", d=false, e="false"
This should result in the following list:
['a=foo', 'b=bar', 'c="foo, bar"', 'd=false', 'e="false'"']
When using shlex in posix-mode and splitting with ", ", the argument for c
gets treated correctly. However, it removes the quotes. I need them because false
is not the same as "false"
, for instance.
My code so far:
import shlex
mystring = 'a=foo, b=bar, c="foo, bar", d=false, e="false"'
splitter = shlex.shlex(mystring, posix=True)
splitter.whitespace += ','
splitter.whitespace_split = True
print list(splitter) # ['a=foo', 'b=bar', 'c=foo, bar', 'd=false', 'e=false']
>>> s = r'a=foo, b=bar, c="foo, bar", d=false, e="false", f="foo\", bar"'
>>> re.findall(r'(?:[^\s,"]|"(?:\\.|[^"])*")+', s)
['a=foo', 'b=bar', 'c="foo, bar"', 'd=false', 'e="false"', 'f="foo\\", bar"']
- The regex pattern
"[^"]*"
matches a simple quoted string.
"(?:\\.|[^"])*"
matches a quoted string and skips over escaped quotes because \\.
consumes two characters: a backslash and any character.
[^\s,"]
matches a non-delimiter.
- Combining patterns 2 and 3 inside
(?: | )+
matches a sequence of non-delimiters and quoted strings, which is the desired result.
Regex can solve this easily enough:
import re
mystring = 'a=foo, b=bar, c="foo, bar", d=false, e="false"'
splitString = re.split(',?\s(?=\w+=)',mystring)
The regex pattern here looks for a whitespace followed by a word character and then an equals sign which splits your string as you desire and maintains any quotes.