After answering a question about how to parse a text file containing arrays of floats, I ran the following benchmark:
import timeit
import random
line = [random.random() for x in range(1000)]
n = 10000
json_setup = 'line = "{}"; import json'.format(line)
json_work = 'json.loads(line)'
json_time = timeit.timeit(json_work, json_setup, number=n)
print "json: ", json_time
ast_setup = 'line = "{}"; import ast'.format(line)
ast_work = 'ast.literal_eval(line)'
ast_time = timeit.timeit(ast_work, ast_setup, number=n)
print "ast: ", ast_time
print "time ratio ast/json: ", ast_time / json_time
I ran this code several times and consistently got this kind of results:
$ python json-ast-bench.py
json: 4.3199338913
ast: 28.4827561378
time ratio ast/json: 6.59333148483
So it appears that json
is almost an order of magnitude faster than ast
for this use case.
I had the same results with both Python 2.7.5+ and Python 3.3.2+.
Questions:
- Why is json.loads so much faster ? This question seems to imply that ast is more flexible regarding the input data (double or single quotes)
- Are there use cases where I would prefer to use
ast.literal_eval
overjson.loads
although it's slower ?
Edit: Anyway if performance matters, I would recommend using UltraJSON (just what I use at work, ~4 times faster than json using the same mini-benchmark).
The two functions are parsing entirely different languages—JSON, and Python literal syntax.* As
literal_eval
says:JSON, by contrast, only handles double-quoted JavaScript string literals (not quite identical to Python's**), JavaScript numbers (only int and float***), objects (roughly equivalent to dicts), arrays (roughly equivalent to lists), JavaScript booleans (which are different from Python's), and
null
.The fact that these two languages happen to have some overlap doesn't mean they're the same language.
Because Python literal syntax is a more complex and powerful language than JSON, it's likely to be slower to parse. And, probably more importantly, because Python literal syntax is not intended to be used as a data interchange format (in fact, it's specifically not supposed to be used for that), nobody is likely to put much effort into making it fast for data interchange.****
That, and raw string literals, and Unicode vs. bytes string literals, and complex numbers, and sets, and all kinds of other things JSON doesn't handle.
Yes. When you want to parse Python literals, you should use
ast.literal_eval
. (Or, better yet, re-think your design so you don't want to parse Python literals…)* This is a bit of a vague term. For example,
-2
is not a literal in Python, but an operator expression, butliteral_eval
can handle it. And of course tuple/list/dict/set displays are not literals, butliteral_eval
can handle them—except that comprehensions are also displays, andliteral_eval
cannot handle them. Other functions in theast
module can help you find out what really is and isn't a literal—e.g.,ast.dump(ast.parse("expr"))
.** For example,
"\q"
is an error in JSON.*** Technically, JSON only handles one "number" type, which is floating-point. But Python's
json
module parses numbers with no decimal point or exponent as integers, and the same is true in many other languages' JSON modules.**** If you missed Tim Peters's comment on the question: "
ast.literal_eval
is so lightly used that nobody felt it was worth the time to work (& work, & work) at speeding it. In contrast, the JSON libraries are routinely used to parse gigabytes of data."