Redis `SCAN`: how to maintain a balance between ne

I am not that familiar with Redis. At the moment I am designing some realtime service and I'd like to rely on it. I expect ~10000-50000 keys per minute to be SET with some reasonable EX and match over them using SCAN rarely enough not to bother with performance bottlenecks.

The thing I doubt is "in/out rate" and possible overflooding with keys that might match some SCAN query and thus it never terminates (i.e. always replies with latest cursor position and forces you to continue; that could happen easily if one consumes x items per second and there are x + y items per second coming in with y > 0).

Obviously, I could set desired SCAN size long enough; but I wonder if there exists a better solution or does Redis itself guarantees that SCAN will grow size automatically in such a case?

标签： redis

1条回答

甜甜的少女心

2楼-- · 2020-02-15 03:34

First some context, solution at the end:

From SCAN command > Guarantee of termination

The SCAN algorithm is guaranteed to terminate only if the size of the iterated collection remains bounded to a given maximum size, otherwise iterating a collection that always grows may result into SCAN to never terminate a full iteration.

This is easy to see intuitively: if the collection grows there is more and more work to do in order to visit all the possible elements, and the ability to terminate the iteration depends on the number of calls to SCAN and its COUNT option value compared with the rate at which the collection grows.

But in The COUNT option it says:

Important: there is no need to use the same COUNT value for every iteration. The caller is free to change the count from one iteration to the other as required, as long as the cursor passed in the next call is the one obtained in the previous call to the command.

Important to keep in mind, from Scan guarantees:

A given element may be returned multiple times. It is up to the application to handle the case of duplicated elements, for example only using the returned elements in order to perform operations that are safe when re-applied multiple times.

Elements that were not constantly present in the collection during a full iteration, may be returned or not: it is undefined.

The key to a solution is in the cursor itself. See Making sense of Redis’ SCAN cursor. It is possible to deduce the percent of progress of your scan because the cursor is really the bits-reversed of an index to the table size.

Using DBSIZE or INFO keyspace command you can get how many keys you have at any time:

> DBSIZE
(integer) 200032
> info keyspace
# Keyspace
db0:keys=200032,expires=0,avg_ttl=0

Another source of information is the undocumented DEBUG htstats index, just to get a feeling:

> DEBUG htstats 0
[Dictionary HT]
Hash table 0 stats (main hash table):
 table size: 262144
 number of elements: 200032
 different slots: 139805
 max chain length: 8
 avg chain length (counted): 1.43
 avg chain length (computed): 1.43
 Chain length distribution:
   0: 122339 (46.67%)
   1: 93163 (35.54%)
   2: 35502 (13.54%)
   3: 9071 (3.46%)
   4: 1754 (0.67%)
   5: 264 (0.10%)
   6: 43 (0.02%)
   7: 6 (0.00%)
   8: 2 (0.00%)
[Expires HT]
No stats available for empty dictionaries

The table size is the power of 2 following your number of keys: Keys: 200032 => Table size: 262144

The solution:

We will calculate a desired COUNT argument for every scan.

Say you will be calling SCAN with a frequency (F in Hz) of 10 Hz (every 100 ms) and you want it done in 5 seconds (T in s). So you want this finished in N = F*T calls, N = 50 in this example.

Before your first scan, you know your current progress is 0, so your remaining percent is RP = 1 (100%).

Before every SCAN call (or every given number of calls that you want to adjust your COUNT if you want to save the Round Trip Time (RTT) of a DBSIZE call), you call DBSIZE to get the number of keys K.

You will use COUNT = K*RP/N

For the first call, this is COUNT = 200032*1/50 = 4000.

For any other call, you need to calculate RP = 1 - ReversedCursor/NextPowerOfTwo(K).

For example, let say you have done 20 calls already, so now N = 30 (remaining number of calls). You called DBSIZE and got K = 281569. This means NextPowerOfTwo(K) = 524288, this is 2^19.

Your next cursor is 14509 in decimal = 000011100010101101 in binary. As the table size is 2^19, we represent it with 18 bits.

You reverse the bits and get 101101010001110000 in binary = 185456 in decimal. This means we have covered 185456 out of 524288. And:

RP = 1 - ReversedCursor/NextPowerOfTwo(K) = 1 - 185456 / 524288 = 0.65 or 65%

So you have to adjust:

COUNT = K*RP/N = 281569 * 0.65 / 30 = 6100

So in your next SCAN call you use 6100. Makes sense it increased because:

The amount of keys has increased from 200032 to 281569.
Although we have only 60% of our initial estimate of calls remaining, progress is behind as 65% of the keyspace is pending to be scanned.

All this was assuming you are getting all keys. If you're pattern-matching, you need to use the past to estimate the remaining amount of keys to be found. We add as a factor PM (percent of matches) to the COUNT calculation.

COUNT = PM * K*RP/N

PM = keysFound / ( K * ReversedCursor/NextPowerOfTwo(K))

If after 20 calls, you have found only keysFound = 2000 keys, then:

PM = 2000 / ( 281569 * 185456 / 524288) = 0.02

This means only 2% of the keys are matching our pattern so far, so

COUNT = PM * K*RP/N = 0.02 * 6100 = 122

This algorithm can probably be improved, but you get the idea.

Make sure to run some benchmarks on the COUNT number you'll use to start with, to measure how many milliseconds is your SCAN taking, as you may need to moderate your expectations about how many calls you need (N) to do this in a reasonable time without blocking the server, and adjust your F and T accordingly.

0人赞添加讨论(0) 举报

Redis `SCAN`: how to maintain a balance between ne

The solution:

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间