可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I have a dictionary that is stored in a db field as a string. I am trying to parse it into a dict, but json.loads
gives me an error.
Why does json.loads
fail on this and ast.literal_eval
works? Is one preferable over the other?
>>> c.iframe_data
u"{u'person': u'Annabelle!', u'csrfmiddlewaretoken': u'wTE9RZGvjCh9RCL00pLloxOYZItQ98JN'}"
# json fails
>>> json.loads(c.iframe_data)
Traceback (most recent call last):
ValueError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
# ast.literal_eval works
>>> ast.literal_eval(c.iframe_data)
{u'person': u'Annabelle!', u'csrfmiddlewaretoken': u'wTE9RZGvjCh9RCL00pLloxOYZItQ98JN'}
回答1:
json.loads
failed because your c.iframe_data
value is not a valid JSON document. In valid json
document string are quoted in double quote and there isn't anything like u
for converting strings to unicode.
Using json.loads(c.iframe_data)
means deserialize the JSON
document in c.iframe_data
ast.literal_eval
is used whenever you need eval to evaluate input
expression. If you have Python expressions as an input that you want to evaluate.
Is one preferable over the other?
It depends on the data. See this answer for more context.
回答2:
I have a dictionary that is stored in a db field as a string.
This is a design fault. While it's perfectly possible, as someone appears to have done, to extract the repr
of a dictionary, there's no guarantee that the repr
of an object can be evaluated at all.
In the presence of only string keys and string and numeric values, most times the Python eval
function will reproduce the value from its repr, but I am unsure why you think that this would make it valid JSON, for example.
I am trying to parse it into a dict, but json.loads gives me an error.
Naturally. You aren't storing JSON in the database, so it hardly seems reasonable to expect it to parse as JSON. While it's interesting that ast.literal_eval
can be used to parse the value, again there are no guarantees beyond relatively simple Python types.
Since it appears your data is indeed limited to such types, the real solution to your problem is to correct the way the data is stored, by converting the dictionary to a string with json.dumps
before storage in the database. Some database systems (e.g., PostgreSQL) have JSON types to make querying such data simpler, and I'd recommend you use such types if they are available to you.
As to which is "better," that will always depend on the specific application, but JSON was explicitly designed as a compact human-readable machine-parseable format for simple structured data, whereas your current representation is based on formats specific to Python, which (for example) would be tediously difficult to evaluate in other languages. JSON is the applicable standard here, and you will benefit from using it.
回答3:
Because that u"{u'person': u'Annabelle!', u'csrfmiddlewaretoken': u'wTE9RZGvjCh9RCL00pLloxOYZItQ98JN'}"
is a Python unicode string, not a Javascript Object Notation , in chrome console:
bad = {u'person': u'Annabelle!', u'csrfmiddlewaretoken': u'wTE9RZGvjCh9RCL00pLloxOYZItQ98JN'}
SyntaxError: Unexpected string
good = {'person': 'Annabelle!', 'csrfmiddlewaretoken': 'wTE9RZGvjCh9RCL00pLloxOYZItQ98JN'}
Object {person: "Annabelle!", csrfmiddlewaretoken: "wTE9RZGvjCh9RCL00pLloxOYZItQ98JN"}
Or you can use yaml to deal with it:
>>> a = '{"person": "Annabelle!", "csrfmiddlewaretoken": "wTE9RZGvjCh9RCL00pLloxOYZItQ98JN"}'
>>> json.loads(a)
{u'person': u'Annabelle!', u'csrfmiddlewaretoken': u'wTE9RZGvjCh9RCL00pLloxOYZItQ98JN'}
>>> import ast
>>> ast.literal_eval(a)
{'person': 'Annabelle!', 'csrfmiddlewaretoken': 'wTE9RZGvjCh9RCL00pLloxOYZItQ98JN'}
>>> import yaml
>>> a = '{u"person": u"Annabelle!", u"csrfmiddlewaretoken": u"wTE9RZGvjCh9RCL00pLloxOYZItQ98JN"}'
>>> yaml.load(a)
{'u"person"': 'u"Annabelle!"', 'u"csrfmiddlewaretoken"': 'u"wTE9RZGvjCh9RCL00pLloxOYZItQ98JN"'}
>>> a = u'{u"person": u"Annabelle!", u"csrfmiddlewaretoken": u"wTE9RZGvjCh9RCL00pLloxOYZItQ98JN"}'
>>> yaml.load(a)
{'u"person"': 'u"Annabelle!"', 'u"csrfmiddlewaretoken"': 'u"wTE9RZGvjCh9RCL00pLloxOYZItQ98JN"'}
回答4:
json.loads
is used specifically to parse JSON which is quite a restrictive format. There is no u'...'
syntax and all strings are delimited by double quotes, not single quotes. Use json.dumps
to serialise something that can be read by json.loads
.
So json.loads(string)
is the inverse of json.dumps(object)
whereas ast.literal_eval(string)
is (vaguely) the inverse of repr(object)
.
JSON is nice because it's portable -- there are parsers for it trivially available in pretty much every language. So if you want to send JSON to a Javascript frontend you'll have no issues.
ast.literal_eval
isn't easily portable but it's slightly richer: you can use tuples, sets, and dicts whose keys aren't restricted to strings, for example.
Also json.loads
is significantly faster than ast.literal_eval
.
回答5:
First, and most importantly, do not serialize data twice. Your database is itself a serialization of data, with a rich and expressive set of tools to query, explore, manipulate, and present it. Serializing data to be subsequently placed in a database eliminates the possibility for isolated sub-component updates, sub-component querying & indexing, and couples all writes to mandatory initial reads, for a few of the most significant issues.
Next, Java Script Object Notation (JSON) is a limited subset of the JavaScript language suitable for the representation of static data in service of data interchange. As a subset of the language, this means you can naively eval
it within JS to reconstruct the original object. It is a simple serialization (no advanced features such as internal references, template definition, type extension) with the limitations of the JavaScript language baked in and penalties for the use of strings requiring large amounts of "escaping". The use of end markers also makes it difficult to utilize in purely streaming scenarios, e.g. you can't "finalize" an object until hitting its paired }
, and as such it also has no marker for record separation. Notable examples of other limitations include delivering HTML within JSON requiring excessive escaping, all numbers are floating point (54-bit integer accuracy, rounding errors, …) making it patently unsuitable for the storage or transfer of financial information or use of technologies (e.g. crypto) requiring 64-bit integers, no native date representation, ...
There are some significant differences between JS and Python as languages, and thus in how JSON "JavaScript Object Notation" vs. PLS (Python Literal Syntax) behave. It just so happens that for the purpose of literal definition, most of JavaScript literal syntax is directly compatible with Python, albeit with slightly differing interpretations. The reverse is not true, see the above examples of disparity. If you care about preserving the fidelity of your data for Python, Python literals are more expressive and less "lossy" than their JS equivalents. However, as other answers/comments have noted, repr()
is not a reliable way to generate this representation; Python literal syntax is not meant to be used this way. For the greatest type fidelity I generally recommend YAML serialization, of which JSON is a fully valid subset.
FYI, to address the practical concern of storage of dictionary-like mappings associated with entities, there are entity-attribute-value data models. Arbitrary key-value stores in relational databases FTW, but with power comes responsibility. Use this pattern carefully and only when absolutely needed. (If this is a frequent pattern, look into document stores.)
回答6:
json.loads
should strongly be preferred to ast.literal_eval
for parsing JSON, for all the reasons below (summarizing other posters).
In your specific example, your input was illegal/malformed JSON exported the wrong way using Python 2.x (all the unwanted and illegal u'
prefixes), anyway Python 2.x is itself near-EOL, please move to 3.x. You can simply use a regex to fixup/preprocess that:
>>> import json
>>> import re
>>> malformed_json = u"{u'person': u'Annabelle!', u'csrfmiddlewaretoken': u'wTE9RZGvjCh9RCL00pLloxOYZItQ98JN'}"
>>> legal_json = re.sub(r'u\'([^\']*)\'', r'"\1"', malformed_json)
'{"person": "Annabelle!", "csrfmiddlewaretoken": "wTE9RZGvjCh9RCL00pLloxOYZItQ98JN"}'
>>> json.loads(legal_json)
{'person': 'Annabelle!', 'csrfmiddlewaretoken': 'wTE9RZGvjCh9RCL00pLloxOYZItQ98JN'}
- (Note: if your architecture has lots of malformed JSON strings exported the wrong way from 2.x, stored in a DB, that's not a legit reason not to use
json.loads
, but it is to revisit your architecture. At least just run the fixup regex on all your strings, once, and store the legal JSON back))
json.loads
Pros/Cons:
handles all legal JSON, unlike ast.literal_eval
slow. There are much faster JSON libraries like ultrajson, yajl, simplejson
etc. Also, on large import jobs you can use multiprocessing/multithreading (which also gives you protection from memory leaks, which is a common issue with all parsers).
numerical fields: converts all integers, long integers and floats to double, may lose precision (@amcgregor)