I've been using Redis for a while as a backend for Resque and now that I'm looking for a fast way to perform intersect operation on large sets of data, I decided to give Redis a shot.
I've been conducting the following test:
— x, y and z are Redis sets, they all contain approx. 1 million members (random integers taken from a seed array containing 3M+ members).
— I want to intersect x y and z, so I'm using sintersectstore (to avoid overheating caused by data retrieval from the server to the client)
sinterstore r x y z
— the resulting set (r) contains about half a million members, Redis computes this set in approximately half a second.
Half a second is not bad, but I would need to perform such calculations on sets that could contain more than a billion members each.
I haven't tested how Redis would react with such enormous sets but I assume it would take a lot more time to process the data.
Am I doing this right? Is there a faster way to do that?
Notes:
— native arrays aren't an option since I'm looking for a distributed data store that would be accessed by several workers.
— I get these results on a 8 cores @3.4Ghz Mac with 16GB of RAM, disk saving has been disabled on the Redis configuration.