I know how a B-Tree works in-memory, it's easy enough to implement. However, what is currently completely beyond me, is how to find a data layout that works effectively on disk, such that:
- The number of entries in the B-Tree can grow indefinitly (or at least to > 1000GB)
- Disk-level copying operations are minimized
- The values can have arbitrary size (i.e. no fixed schema)
If anyone could provide insight into layouting B-Tree structures on disk level, I'd be very grateful. Especially the last bullet point gives me a lot of headache. I would also appreciate pointers to books, but most database literature I've seen explains only the high-level structure (i.e. "this is how you do it in memory"), but skips the nitty gritty details on the disk layout.
Read it This will definitely help http://www.geeksforgeeks.org/b-tree-set-1-introduction-2/
some info about oracle index internals: http://www.toadworld.com/platforms/oracle/w/wiki/11001.oracle-b-tree-index-from-the-concept-to-internals
Notes:
Databases do not directly implement indexes based on B-tree but on a variant called B+ tree. Which according to wikipedia:
Databases work, in general, with block-oriented storage and b+ tree is more suited then a b-tree for this.
The blocks are fixed size and are left with some free space to accommodate future changes in value or key size.
A block can be either a leaf(holds actual data) or branch(holds the pointers to the leaf nodes)
A toy model how writing to disk can be implemented (for a block size 10k for arithmetic simplification):
When the information is read from a big index: can go following:
a really large index can be split on multiple files, then the address of block will be something as (filename_id, address_relative_to_this_file)