I search for best way to store lists associated with key in key value database (like berkleydb
or leveldb
)
For example: I have users and orders from user to user I want to store list of orders ids for each user to fast access with range selects (for pagination)
How to store this structure?
I don't want to store it in serializable format for each user:
user_1_orders = serialize(1,2,3..)
user_2_orders = serialize(1,2,3..)
beacuse list can be long
I think about separate db file for each user with store orders ids as keys in it, but this does not solve range selects problem.. What if I want to get user ids with range [5000:5050]
?
I know about redis
, but interest in key value implementation like berkleydb
or leveldb
.
You can use Redis to store list in
zset
(sorted set), like this:Redis is fast enough. But one thing you should know about Redis is that it stores all data in memory, so if the data eventually exceed the physical memory, you have to shard the data by your own.
Also you can use
SSDB
(https://github.com/ideawu/ssdb), which is a wrapper ofleveldb
, has similar APIs to Redis, but stores most data in disk, memory is only used for caching. That means SSDB's capacity is 100 times of Redis' - up to TBs.Let start with a single list. You can work with a single hashmap:
0
the count of user's orderSo yoru hashmap looks like the following:
Steady increment of the key makes sure that every key is unique. Given the fact that the db is key ordered and that the pack function translates integers into a set of byte arrays that are correctly ordered you can fetch slices of the list. To fetch orders between 5000 and 5050 you can use bsddb
Cursor.set_range
or leveldb'screateReadStream
(js api)Now let's expand to multiple user orders. If you can open several hashmap you can use the above using several hashmap. Maybe you will hit some system issues (max nb of open fds or max num of files per directory). So you can use a single and share the same hashmap for several users.
What I explain in the following works for both leveldb and bsddb given the fact that you
pack
keys correctly using the lexicographic order (byteorder). So I will assume that you have apack
function. In bsddb you have to build apack
function yourself. Have a look atwiredtiger.packing
or bytekey for inspiration.The principle is to namespace the keys using the user's id. It's also called key composition.
Say you database looks like the following:
You create this database with the following (pseudo) code:
make_uid
implementation looks like this:Then you have to do the correct range lookup, it's similar to the single composite-key. Using bsddb api
cursor.set_range(key)
we retrieve all items between5000
and5050
for user42
:Not error checks are done. Among other things slicing
user_orders_slice(42, 5000, 5050)
is not guaranteed to tore 51 items if you delete items from the list. A correct way to query say50
items, is to implement a user_orders_query(user_id, start, limit)`.I hope you get the idea.
One way you could model this in a key-value store which supports scans , like leveldb, would be to add the order id to the key for each user. So the new keys would be userId_orderId for each order. Now to get orders for a particular user, you can do a simple prefix scan - scan(userId*). Now this makes the userId range query slow, in that case you can maintain another table just for userIds or use another key convention : Id_userId for getting userIds between [5000-5050]
Recently I have seen hyperdex adding data types support on top of leveldb : ex: http://hyperdex.org/doc/04.datatypes/#lists , so you could give that a try too.
In BerkeleyDB you can store multiple values per key, either in sorted or unsorted order. This would be the most natural solution. LevelDB has no such feature. You should look into
LMDB
(http://symas.com/mdb/) though, it also supports sorted multi-value keys, and is smaller, faster, and more reliable than either of the others.