What's the best way to build a dictionary from a string like the one below:
"{key1 value1} {key2 value2} {key3 {value with spaces}}"
So the key is always a string with no spaces but the value is either a string or a string in curly brackets (it has spaces)?
How would you dict it into:
{'key1': 'value1', 'key2': 'value2', 'key3': 'value with spaces'}
You can try this.
Output:
Here with
re.findall
we extractkey
and itsvalue
.re.findall
returns a list with tuples of all key,value pairs.Usingdict
on list of tuples provides the final answer. Read more here.Assuming that you don't have anything in your string more nested than what is in your example, you could first use lookahead/lookbehind assertions to split the string into your key-value pairs, looking for the pattern
} {
(the end of one pair of brackets and the beginning of another.)This says "Match on any
\s*
(whitespace) that has a}
before it and a{
after it, but don't include those brackets in the match itself."Then you have your key-value pairs:
which can be split on whitespace with the
maxsplit
parameter set to 1, to make sure that it only splits on the first space. In this example I have also used string indexing (the[1:-1]
) to get rid of the curly braces that I know are at the beginning and end of each pair.then just check whether the value is enclosed in curly braces, and remove them if you need to before putting them into your dictionary.
If it is guaranteed that the key/value pairs will always be separated by a single space character, then you could use plain old string split instead.
The answer by @vks doesn't check for balanced braces. Try the following:
Try instead:
that is, it matches only on the part with correct bracing.
The
(?P<Brace>\{)
saves the match of a{
, and later(?(Brace)\})
will match}
only if the first one matched, and so braces must come in matching pairs. And by the(?(Brace)...|...)
construct, if\Brace
matched, the value part can contain anything except braces ([^{}]*
), else no space is allowed ([^{}\s]*
).As the optional brace is matched in the regexp, and thus returned in the list, we need to extract element 0 and 2 from each list by the
map()
function.Regexps easily gets messy.
I can´t make it more elegantly:
It´s very hacky, but it should do the job.