Convert string tuples to dict

2019-05-07 03:00发布

问题:

I have malformed string:

a = '(a,1.0),(b,6.0),(c,10.0)'

I need dict:

d = {'a':1.0, 'b':6.0, 'c':10.0}

I try:

print (ast.literal_eval(a))
#ValueError: malformed node or string: <_ast.Name object at 0x000000000F67E828>

Then I try replace chars to 'string dict', it is ugly and does not work:

b = a.replace(',(','|{').replace(',',' : ')
     .replace('|',', ').replace('(','{').replace(')','}')
print (b)
{a : 1.0}, {b : 6.0}, {c : 10.0}

print (ast.literal_eval(b))
#ValueError: malformed node or string: <_ast.Name object at 0x000000000C2EA588>

What do you do? Something missing? Is possible use regex?

回答1:

Given the string has the above stated format, you could use regex substitution with backrefs:

import re

a = '(a,1.0),(b,6.0),(c,10.0)'
a_fix = re.sub(r'\((\w+),', r"('\1',",a)

So you look for a pattern (x, (with x a sequence of \ws and you substitute it into ('x',. The result is then:

# result
a_fix == "('a',1.0),('b',6.0),('c',10.0)"

and then parse a_fix and convert it to a dict:

result = dict(ast.literal_eval(a_fix))

The result in then:

>>> dict(ast.literal_eval(a_fix))
{'b': 6.0, 'c': 10.0, 'a': 1.0}


回答2:

No need for regexes, if your string is in this format.

>>> a = '(a,1.0),(b,6.0),(c,10.0)'
>>> d = dict([x.split(',') for x in a[1:-1].split('),(')])
>>> print(d)
{'c': '10.0', 'a': '1.0', 'b': '6.0'}

We remove the first opening parantheses and last closing parantheses to get the key-value pairs by splitting on ),(. The pairs can then be split on the comma.

To cast to float, the list comprehension gets a little longer:

d = dict([(a, float(b)) for (a, b) in [x.split(',') for x in a[1:-1].split('),(')]])


回答3:

If there are always 2 comma-separated values inside parentheses and the second is of a float type, you may use

import re
s = '(a,1.0),(b,6.0),(c,10.0)'
print(dict(map(lambda (w, m): (w, float(m)), [(x, y) for x, y in re.findall(r'\(([^),]+),([^)]*)\)', s) ])))

See the Python demo and the (quite generic) regex demo. This pattern just matches a (, then 0+ chars other than a comma and ) capturing into Group 1, then a comma is matched, then any 0+ chars other than ) (captured into Group 2) and a ).

As the pattern above is suitable when you have pre-validated data, the regex can be restricted for your current data as

r'\((\w+),(\d*\.?\d+)\)'

See the regex demo

Details:

  • \( - a literal (
  • (\w+) - Capturing group 1: one or more word (letter/digit/_) chars
  • , - a comma
  • (\d*\.?\d+) - a common integer/float regex: zero or more digits, an optional . (decimal separator) and 1+ digits
  • \) - a literal closing parenthesis.


回答4:

the reason why eval() dose not work is the a, b, c are not defined, we can define those with it's string form and eval will get that string form to use

In [11]: text = '(a,1.0),(b,6.0),(c,10.0)'

In [12]: a, b, c = 'a', 'b', 'c'

In [13]: eval(text)
Out[13]: (('a', 1.0), ('b', 6.0), ('c', 10.0))

In [14]: dict(eval(text))
Out[14]: {'a': 1.0, 'b': 6.0, 'c': 10.0}

to do this in regex way:

In [21]: re.sub(r'\((.+?),', r'("\1",', text)
Out[21]: '("a",1.0),("b",6.0),("c",10.0)'
In [22]: eval(_)
Out[22]: (('a', 1.0), ('b', 6.0), ('c', 10.0))

In [23]: dict(_)
Out[23]: {'a': 1.0, 'b': 6.0, 'c': 10.0}