I need a tool similar to cdb (constant database) that would allow me to store large sets of data (in the range of hundreds of gigabytes) in indexed files. CDB is an ideal candidate but it has a 2 GB file size limit so it's not suitable.
The functionality I'm looking for is a persistent key-value store supporting binary keys and values. After creating the database is read only and will never be modified.
Can you recommend some tool? And btw, storage overhead should be small because I will be storing billion of records.
BTW I'm looking for a db management library (embeddable), not a standalone server. Something that can be used inside a C program.
Thanks,
RG
Another option is mcdb, which is extended from Dan J. Bernstein's cdb.
https://github.com/gstrauss/mcdb/
mcdb supports very large constant databases and is faster than cdb, both for database creation and database access. Still, creating a database of hundreds of gigabytes can take a bit of time. mcdb can create a gigabyte-sized database in a few seconds for cached data or in a minute or so when starting from cold cache.
https://github.com/gstrauss/mcdb/blob/master/t/PERFORMANCE
(Disclosure: I am the author of mcdb)
There's hamsterdb (i'm the author), berkeleydb, tokyo cabinet.
hamsterdb uses a btree and therefore sorts your data. tokyo cabinet is a hash table and therefore not sorted. berkeleydb can do both.
Needless to say what I would recommend ;)
All of them can be linked into a C application. None of them should have a 2GB limit.
bye
Christoph
If your value is large and keys are small you can consider redis as well http://redis.io