I am looking for simple way to split parenthesized lists that come out of IMAP responses into Python lists or tuples. I want to go from
'(BODYSTRUCTURE ("text" "plain" ("charset" "ISO-8859-1") NIL NIL "quoted-printable" 1207 50 NIL NIL NIL NIL))'
to
(BODYSTRUCTURE, ("text", "plain", ("charset", "ISO-8859-1"), None, None, "quoted-printable", 1207, 50, None, None, None, None))
pyparsing's nestedExpr parser function parses nested parentheses by default:
prints:
Here is a slightly modified parser, which does parse-time conversion of integer strings to integers, from "NIL" to None, and stripping quotes from quoted strings:
Prints:
The fact that there's nested tuples makes this impossible with a regex. You'll have to write a parser to denote when you're inside a parenthesis or not.
You could try
Edit: Well I got something that works with your example, not sure it's what you want though.
BODYSTRUCTURE needs to be defined somewhere.
Taking out only internal part of the server answer containing actualy the body structure:
Next step is to replace some tokens, what would prepair string to transform into python types:
Using built-in module compiler to parse our structure:
Performing simple recursive function to transform expression:
And finally we get the desired result as nested python tuples:
From where we can recognize different headers of body structure: