What's the fastest way in Python (Django) to create a JSON based upon a Django queryset. Note that parsing it in the template as proposed here is not an option.
The background is that I created a method which loops over all nodes in a tree, but is already terribly slow when converting about 300 nodes. The first (and probably the worst) idea that came up to my mind is to create the json somehow "manually". See the code below.
#! Solution 1 !!#
def quoteStr(input):
return "\"" + smart_str(smart_unicode(input)) + "\""
def createJSONTreeDump(user, node, root=False, lastChild=False):
q = "\""
#open tag for object
json = str("\n" + indent + "{" +
quoteStr("name") + ": " + quoteStr(node.name) + ",\n" +
quoteStr("id") + ": " + quoteStr(node.pk) + ",\n" +
)
childrenTag = "children"
children = node.get_children()
if children.count() > 0 :
#create children array opening tag
json += str(indent + quoteStr(childrenTag) + ": [")
#for child in children:
for idx, child in enumerate(children):
if (idx + 1) == children.count():
//recursive call
json += createJSONTreeDump(user, child, False, True, layout)
else:
//recursive call
json += createJSONTreeDump(user, child, False, False, layout)
#add children closing tag
json += "]\n"
#closing tag for object
if lastChild == False:
#more children following, add ","
json += indent + "},\n"
else:
#last child, do not add ","
json += indent + "}\n"
return json
The tree structure to be rendered is a tree build up with mptt, where the call .get_children() returns all children of a node.
The model looks as simply as this, mptt taking care of everything else.
class Node(MPTTModel, ExtraManager):
"""
Representation of a single node
"""
name = models.CharField(max_length=200)
parent = TreeForeignKey('self', null=True, blank=True, related_name='%(app_label)s_%(class)s_children')
The expected JSON result created like this in the template var root = {{ jsonTree|safe }}
Edit: Based upon this answer I created the following code (definitely the better code) but feels only slightly faster.
Solution 2:
def serializable_object(node):
"Recurse into tree to build a serializable object"
obj = {'name': node.name, 'id': node.pk, 'children': []}
for child in node.get_children():
obj['children'].append(serializable_object(child))
return obj
import json
jsonTree = json.dumps(serializable_object(nodeInstance))
Solution 3:
def serializable_object_List_Comprehension(node):
"Recurse into tree to build a serializable object"
obj = {
'name': node.name,
'id': node.pk,
'children': [serializable_object(ch) for ch in node.get_children()]
}
return obj
Solution 4:
def recursive_node_to_dict(node):
result = {
'name': node.name, 'id': node.pk
}
children = [recursive_node_to_dict(c) for c in node.get_children()],
if children is not None:
result['children'] = children
return result
from mptt.templatetags.mptt_tags import cache_tree_children
root_nodes = cache_tree_children(root.get_descendants())
dicts = []
for n in root_nodes:
dicts.append(recursive_node_to_dict(root_nodes[0]))
jsonTree = json.dumps(dicts, indent=4)
Solution 5 (make use of select_related to pre_fetch, whereas not sure if correctly used)
def serializable_object_select_related(node):
"Recurse into tree to build a serializable object, make use of select_related"
obj = {'name': node.get_wbs_code(), 'wbsCode': node.get_wbs_code(), 'id': node.pk, 'level': node.level, 'position': node.position, 'children': []}
for child in node.get_children().select_related():
obj['children'].append(serializable_object(child))
return obj
Solution 6 (improved solution 4, using caching of child nodes):
def recursive_node_to_dict(node):
result = {
'name': node.name, ''id': node.pk,
#notice the use of node._cached_children instead of node.get_children()
'children' : [recursive_node_to_dict(c) for c in node._cached_children]
}
Called via:
from mptt.templatetags.mptt_tags import cache_tree_children
subTrees = cache_tree_children(root.get_descendants(include_self=True))
subTreeDicts = []
for subTree in subTrees:
subTree = recursive_node_to_dict(subTree)
subTreeDicts.append(subTree)
jsonTree = json.dumps(subTreeDicts, indent=4)
#optional clean up, remove the [ ] at the beginning and the end, its needed for D3.js
jsonTree = jsonTree[1:len(jsonTree)]
jsonTree = jsonTree[:len(jsonTree)-1]
Below you can see the profiling results, created using cProfile as suggested by MuMind, setting up a Django view to start the stand-alone method profileJSON(), which in turn calls the different solutions to create the JSON output.
def startProfileJSON(request):
print "startProfileJSON"
import cProfile
cProfile.runctx('profileJSON()', globals=globals(), locals=locals())
print "endProfileJSON"
Results:
Solution 1: 3350347 function calls (3130372 primitive calls) in 4.969 seconds (details)
Solution 2: 2533705 function calls (2354516 primitive calls) in 3.630 seconds (details)
Solution 3: 2533621 function calls (2354441 primitive calls) in 3.684 seconds (details)
Solution 4: 2812725 function calls (2466028 primitive calls) in 3.840 seconds (details)
Solution 5: 2536504 function calls (2357256 primitive calls) in 3.779 seconds (details)
Solution 6 (Improved solution 4): 2593122 function calls (2299165 primitive calls) in 3.663 seconds (details)
Discussion:
Solution 1: own encoding implementation. bad idea
Solution 2 + 3: currently the fastest, but still painfully slow
Solution 4: looks promising with caching childs, but does perform similar and currently produces not valid json as childrens are put into double []:
"children": [[]] instead of "children": []
Solution 5: use of select_related does not make a difference, whereas probably used in the wrong way, as a node always have a ForeignKey to its parent, and we are parsing from root to child.
Update: Solution 6: It looks like the cleanest solution to me, using caching of child nodes. But does only perform similar to solution 2 + 3. Which for me is strange.
Anybody more ideas for performance improvements?