Looking for an ultrafast data store to perform int

2019-05-09 12:45发布

问题:

I've been using Redis for a while as a backend for Resque and now that I'm looking for a fast way to perform intersect operation on large sets of data, I decided to give Redis a shot.

I've been conducting the following test:

x, y and z are Redis sets, they all contain approx. 1 million members (random integers taken from a seed array containing 3M+ members).

— I want to intersect x y and z, so I'm using sintersectstore (to avoid overheating caused by data retrieval from the server to the client)

sinterstore r x y z

— the resulting set (r) contains about half a million members, Redis computes this set in approximately half a second.

Half a second is not bad, but I would need to perform such calculations on sets that could contain more than a billion members each.

I haven't tested how Redis would react with such enormous sets but I assume it would take a lot more time to process the data.

Am I doing this right? Is there a faster way to do that?

Notes:

— native arrays aren't an option since I'm looking for a distributed data store that would be accessed by several workers.

— I get these results on a 8 cores @3.4Ghz Mac with 16GB of RAM, disk saving has been disabled on the Redis configuration.

回答1:

I suspect that bitmaps are your best hope.

In my experience, redis is a perfect server for bitmaps; you would use the string data structure (one of the five data structures available in redis)

many or perhaps all of the operations you will need to perform are available out-of-the-box in redis, as atomic operations

the redis setbit operation has time complexity of O(1)

In a typical implementation, you would hash your array values to offset values on the bit string, then set each bit at its corresponding offset (or index); like so:

>>> r1.setbit('k1', 20, 1)

the first argument is the key, the second is the offset (index value) and the third is the value at that index on the bitmap.

to find if a bit is set at this offset (20), call getbit passing in the key for the bit string.

>>> r1.getbit('k1', 20)

then on those bitmaps, you can of course perform the usual bitwise operations e.g., logical AND, OR, XOR.