I am not that familiar with Redis
. At the moment I am designing some realtime service and I'd like to rely on it. I expect ~10000-50000 keys per minute to be SET
with some reasonable EX
and match over them using SCAN
rarely enough not to bother with performance bottlenecks.
The thing I doubt is "in/out rate" and possible overflooding with keys that might match some SCAN
query and thus it never terminates (i.e. always replies with latest cursor position and forces you to continue; that could happen easily if one consumes x items per second
and there are x + y items per second coming in
with y > 0
).
Obviously, I could set desired SCAN
size long enough; but I wonder if there exists a better solution or does Redis
itself guarantees that SCAN
will grow size automatically in such a case?
First some context, solution at the end:
From SCAN command > Guarantee of termination
But in The COUNT option it says:
Important to keep in mind, from Scan guarantees:
The key to a solution is in the cursor itself. See Making sense of Redis’ SCAN cursor. It is possible to deduce the percent of progress of your scan because the cursor is really the bits-reversed of an index to the table size.
Using
DBSIZE
orINFO keyspace
command you can get how many keys you have at any time:Another source of information is the undocumented
DEBUG htstats index
, just to get a feeling:The table size is the power of 2 following your number of keys: Keys: 200032 => Table size: 262144
The solution:
We will calculate a desired
COUNT
argument for every scan.Say you will be calling SCAN with a frequency (
F
in Hz) of 10 Hz (every 100 ms) and you want it done in 5 seconds (T
in s). So you want this finished inN = F*T
calls,N = 50
in this example.Before your first scan, you know your current progress is 0, so your remaining percent is
RP = 1
(100%).Before every
SCAN
call (or every given number of calls that you want to adjust your COUNT if you want to save the Round Trip Time (RTT) of aDBSIZE
call), you callDBSIZE
to get the number of keysK
.You will use
COUNT = K*RP/N
For the first call, this is
COUNT = 200032*1/50 = 4000
.For any other call, you need to calculate
RP = 1 - ReversedCursor/NextPowerOfTwo(K)
.For example, let say you have done 20 calls already, so now
N = 30
(remaining number of calls). You calledDBSIZE
and gotK = 281569
. This meansNextPowerOfTwo(K) = 524288
, this is 2^19.Your next cursor is 14509 in decimal =
000011100010101101
in binary. As the table size is 2^19, we represent it with 18 bits.You reverse the bits and get
101101010001110000
in binary = 185456 in decimal. This means we have covered 185456 out of 524288. And:So you have to adjust:
So in your next
SCAN
call you use6100
. Makes sense it increased because:All this was assuming you are getting all keys. If you're pattern-matching, you need to use the past to estimate the remaining amount of keys to be found. We add as a factor
PM
(percent of matches) to theCOUNT
calculation.If after 20 calls, you have found only
keysFound = 2000
keys, then:This means only 2% of the keys are matching our pattern so far, so
This algorithm can probably be improved, but you get the idea.
Make sure to run some benchmarks on the
COUNT
number you'll use to start with, to measure how many milliseconds is yourSCAN
taking, as you may need to moderate your expectations about how many calls you need (N
) to do this in a reasonable time without blocking the server, and adjust yourF
andT
accordingly.