How accurate is the dbsize
command in redis?
I've noticed that the count of keys returned by dbsize
does not match the number of actual keys returned by the keys
command.
Here's an example:
redis-cli dbsize
(integer) 3057
redis-cli keys "*" | wc -l
2072
Why is dbsize
so much higher than the actual number of keys?
I would say it is linked to key expiration.
Key/value stores like Redis or memcached cannot afford to define a physical timer per object to expire. There would be too many of them. Instead they define a data structure to easily track items to be expired, and multiplex all the expiration events to a single physical timer. They also tend to implement a lazy strategy to deal with these events.
With Redis, when an item expires, nothing happens. However, before each item access, a check is systematically done to avoid returning expired items, and potentially delete the item. On top of this lazy strategy, every 100 ms, a scavenger algorithm is triggered to physically expire a number of items (i.e. remove them from the main dictionary). The number of considered keys at each iteration depends on the expiration workload (the algorithm is adaptative).
The consequence is Redis may have a backlog of items to expire at a given point in time, when you have a steady flow of expiration events.
Now coming back to the question, the DBSIZE command just return the size of the main dictionary, so it includes expired items that have not yet been removed. The KEYS command walks through the whole dictionary, accessing individual keys, so it excludes all expired items. The number of items may therefore not match.