What is a good way of hashing a hierarchy (similar to a file structure) in python?
I could convert the whole hierarchy into a dotted string and then hash that, but is there a better (or more efficient) way of doing this without going back and forth all the time?
An example of a structure I might want to hash is:
a -> b1 -> c -> 1 -> d
a -> b2 -> c -> 2 -> d
a -> c -> 1 -> d
If you have access to your hierarchy components as a tuple, just hash it - tuples are hashable. You may not gain a lot over conversion to and from a delimited string, but it's a start.
If this doesn't help, perhaps you could provide more information about how you store the hierarchy/path information.
How do you want to access your hierarchy?
If you're always going to be checking for a full path, then as suggested, use a tuple:
eg:
>>> d["a","b1","c",1,"d"] = value
However, if you're going to be doing things like "quickly find all the items below "a -> b1", it may make more sense to store it as a nested hashtable (otherwise you must iterate through all items to find those you're intereted in).
For this, a defaultdict is probably the easiest way to store. For example:
from collections import defaultdict
def new_dict(): return defaultdict(new_dict)
d = defaultdict(new_dict)
d["a"]["b1"]["c"][1]["d"] = "test"
d["a"]["b2"]["c"][2]["d"] = "test2"
d["a"]["c"][1]["d"] = "test3"
print d["a"]["c"][1]["d"] # Prints test3
print d["a"].keys() # Prints ["c", "b1", "b2"]
You can make any object hashable by implementing the __hash__()
method
So you can simply add a suitable __hash__()
method to the objects storing your hierarchy, e.g. compute the hash recursively, etc.