Hierarchy Optimization on Google Appengine Datasto

2019-03-16 20:00发布

I have hierarchical data stored in the datastore using a model which looks like this:

class ToolCategories(db.Model):  
   name = db.StringProperty()  
   parentKey = db.SelfReferenceProperty(collection_name="parent_category")  
   ...  
   ...  

I want to print all the category names preserving the hierarchy, say in some form like this :

--Information Gathering  
----OS Fingerprinting  
----DNS  
------dnstool  
----Port Scanning   
------windows  
--------nmap  
----DNS3  
----wireless sniffers  
------Windows  
--------Kismet  

To do the above I have used simple recursion using the back referencing capability:

class GetAllCategories (webapp.RequestHandler) :


        def RecurseList(self, object, breaks) :
                output = breaks + object.name + "</br>"
                for cat in object.parent_category:
                        output = output + self.RecurseList(cat, breaks + "--")

                return output



        def get (self) :
                output = ""
                allCategories = ToolCategories.all().filter(' parentKey = ', None)
                for category in allCategories :
                        output = output + self.RecurseList(category, "--")

                self.response.out.write(output)

As I am very new to App engine programming (hardly 3 days since I started writing code), I am not sure if this the most optimized way from the Datastore access standpoint to do the desired job.

Is this the best way? if not what is?

2条回答
太酷不给撩
2楼-- · 2019-03-16 20:34

You have a very reasonable approach! My main caveat would be one having little to do with GAE and a lot with Python: don't build a string from pieces with + or +=. Rather, you make a list of string pieces (with append or extend or list comprehensions &c) and when you're all done you join it up for the final string result with ''.join(thelist) or the like. Even though recent Python versions strive hard to optimize the intrinsically O(N squared) performance of the + or += loops, in the end you're always better off building up lists of strings along the way and ''.joining them up at the very end!

查看更多
够拽才男人
3楼-- · 2019-03-16 20:34

The main disadvantage of your approach is that because you're using the "adjacency list" way of representing trees, you have to do one datastore query for each branch of the tree. Datastore queries are fairly expensive (around 160ms each), so constructing the tree, particularly if it's large, could be rather expensive).

There's another approach, which is essentially the one taken by the datastore for representing entity groups: Instead of just storing the parent key, store the entire list of ancestors using a ListProperty:

class ToolCategories(db.Model):
  name = db.StringProperty()
  parents = db.ListProperty(db.Key)

Then, to construct the tree, you can retrieve the entire thing in one single query:

q = ToolCategories.all().filter('parents =', root_key)
查看更多
登录 后发表回答