I'm trying to make a model for categorizing some objects.
I already tried using django-mptt to easily retrieve related categories, and now I'm searching different solutions to find the best one.
I can't find out though what are main differences between Materialized Path, Adjacency List and Nested Set. Wikipedia didn't give me a short answer, all I know is mptt is probably Nested Set...
Can anyone explain it to me in few words ?
It's easier to explain with examples than with a few words. Consider the sample tree, storing names:
William
Jones
Blake
Adams
Tyler
Joseph
Miller
Flint
Materialized Path means each node stores its full path encoded. For instance, you can store its index using a dot as a delimiter
Name Path
William 1
Jones 1.1
Blake 1.2
Adams 1.2.1
Tyler 1.3
Joseph 2
Miller 2.1
Flint 2.2
So, Adams knows its full name is Wiliam Blake Adams, because it has the 1.2.1
path, pointing to the 1
node at first level — William — to the 1.2
node at level 2 — Blake — and 1.2.1
node at level 3 — Adams.
Adjacency List means the tree is stored by keeping a link to some adjacent nodes. For instance, you can store who is the parent and who is the next sibling.
Name Parent Next
William null Joseph
Jones William Blake
Blake William Tyler
Adams Blake null
Tyler William null
Joseph null null
Miller Joseph Flint
Flint Joseph null
Notice that it could be as simple as just storing the parent, if we don't need to keep the children of a node ordered. Now Adams can get all his ancestors recursively until null to find his full name.
Nested sets means you store each node with some index, usually a left and right value, assigned to each one as you traverse the tree in DFS order, so you know its descendants are within those values. Here's how the numbers would be assigned to the example tree:
1 William 10
2 Jones 3
4 Blake 7
5 Adams 6
8 Tyler 9
11 Joseph 16
12 Miller 13
14 Flint 15
And it would be stored as:
Name left right
William 1 10
Jones 2 3
Blake 4 7
Adams 5 6
Tyler 8 9
Joseph 11 16
Miller 12 13
Flint 14 15
So, now Adams can find its ancestors by querying who has a lower left AND a higher right value, and sort them by the left value.
Each model has its strengths and weaknesses. Choosing which one to use really depends on your application, the database you're using and what kind of operations you'll be doing most often. You can find a good comparison here.
The comparison mentions a fourth model that isn't very common (I don't know of any other implementation but my own) and very complicated to explain in a few words: Nested Interval via Matrix Encoding.
When you insert a new node in a nested set you have to re-enumerate everyone who is ahead of it in the traversal. The idea behind nested intervals is that there's an infinite number of rational numbers between any two integers, so you could store the new node as a fraction of its previous and next nodes. Storing and querying fractions can be troublesome, and this leads to the matrix encoding technique, which transforms those fractions in a 2x2 matrix and most operations can be done by some matrix algebra that isn't obvious at first sight but can be very efficient.
Treebeard has very straightforward implementations of each one of Materialized Path, Nested Sets and Adjacency Lists, with no redundancy. django-mptt actually uses a mix of Nested Sets and Adjacency Lists, since it also keeps a link to the parent and can use it to both query children more efficiently, and to rebuild the tree in case it gets messed up by someone.