What is a clean/efficient method for storing the directory Hierarchy/tree in a Key-Value database (in my case MongoDB but any of them)?
For example a tree structure
- Cars
+ Audi
+ BMW
- M5
+ Ford
- Color
+ Red
- Apple
- Cherry
+ Purple
- Funny
The method I am using now, each object links to it's parent
{
dir: "red"
parent-dir: "color"
}
This makes it very efficient/fast to insert and reorder any aspect of the tree (for example if I want to move Red and all it's children to the Cars directory).
But this method sucks when I want to all subdirectories and their children for a given directory recursively. To make it efficient to parse I can have a structure for example
{
dir: "red"
children: "audi, bmw, ford"
}
{
dir: "bmw"
children: "m5"
}
But if I want to modify the tree, a whole bunch of objects need to touched and modified.
Are there any other methods to storing a directory structure in a KV store?
The method you currently use now is called adjacency list model.
Another model to store hierarchical data in a (relational) database is the nested set model. Its implementation in SQL databases is well known. Also see this article for the modified preorder tree traversal algorithm.
A very simple method: you could store a path per object - with those it should be easy to query trees in NOSQL databases:
{ path: "Color", ... }
{ path: "Color.Red", ... }
{ path: "Color.Red.Apple", ... }
{ path: "Color.Red.Cherry", ... }
When nodes will be removed or renamed some paths must be updated. But in general, this method looks promising. You just have to reserve a special character as separator. The storage space overhead should be negligible.
edit: this method is called materialized path
Finally, here is a comparison of different methods for hierarchical data in NOSQL databases.
I don't have a huge amount of NOSQL experience, so this isn't a definitive answer, but here's how I'd approach it:
I would likely use your first approach, where you have:
{
dir: 'dir_name',
parent_dir: 'parent_dir_name'
}
And then set up a map-reduce to quickly query the children of a directory. MongoDB's map-reduce functionality is still only available in the development branch and I haven't worked with it yet, but in CouchDB (and I assume, with a few modification, in MongoDB) you could do something like:
map:
function(doc) {
emit( doc.parent_dir, doc.dir );
}
reduce:
function(key, values) {
return( values );
}
Which would give you the list of sub-directories for each parent directory.
I suggest storing a heap to the the id's of the data items.
I think this is the best plan. If you need lots and lots of stuff any heap element could be an index to another heap.
eg
{ "id:xxx", "id:yyy", "sub-heap-id:zzz"....}
If this is not clear post a comment and I will explain more when I get home.
Make an index!
http://www.mongodb.org/display/DOCS/Indexes