redis - Using Hashes

2019-03-16 17:23发布

问题:

I'm implementing a social stream and a notification system for my web application by using redis. I'm new to redis and I have some doubts about hashes and their efficiency.

I've read this awesome Instagram post and I planned to implement their similar solution for minimal storage.

As mentioned in their blog, they did like this

To take advantage of the hash type, we bucket all our Media IDs into buckets of 1000 (we just take the ID, divide by 1000 and discard the remainder). That determines which key we fall into; next, within the hash that lives at that key, the Media ID is the lookup key within the hash, and the user ID is the value. An example, given a Media ID of 1155315, which means it falls into bucket 1155 (1155315 / 1000 = 1155):

HSET "mediabucket:1155" "1155315" "939"
HGET "mediabucket:1155" "1155315"
> "939"

So Instead of having 1000 seperate keys they are storing it in one hash with thousand lookup keys. And my doubt is why can't we increase the lookup key values to even more larger.

For eg: Media ID of 1155315 will fall into mediabucket:115 by dividing it by 10000 or even greater than that.

Why are they settling with one hash bucket with 1000 lookup keys. Why can't they have one hash bucket with 100000 lookup keys. Is that related to efficiency?

I need your suggestion for implementing the efficient method in my web application.

P.S. Please! don't say that stackoverflow is not for asking suggestions and I don't know where to find help.

Thanks!

回答1:

Yes, it's related to efficiency.

We asked the always-helpful Pieter Noordhuis, one of Redis’ core developers, for input, and he suggested we use Redis hashes. Hashes in Redis are dictionaries that are can be encoded in memory very efficiently; the Redis setting ‘hash-zipmap-max-entries’ configures the maximum number of entries a hash can have while still being encoded efficiently. We found this setting was best around 1000; any higher and the HSET commands would cause noticeable CPU activity. For more details, you can check out the zipmap source file.

Small hashes are encoded in a special way (zipmaps), that is memory efficient, but makes operations O(N) instead of O(1). So, with one zipmap with 100k fields instead of 100 zipmaps with 1k fields you gain no memory benefits, but all your operations get 100 times slower.



回答2:

Basically, they want the number of values stored in a single hash to not exceed 1000. Probably, they set up their Redis instance configuration to work nicely with this number (thy set hash-zipmap-max-entries).

Every time an hash will exceed the number of elements or element size specified it will be converted into a real hash table, and the memory saving will be lost.

-- http://redis.io/topics/memory-optimization

As I understand, your question is "why exactly 1000 and not more?" Well, it's because they had to choose between space efficiency and speed. Space-efficient representation has operation complexity O(N), not O(1) as normal hashes - it is N times slower, but takes less memory.

They tested different values and found that 1000 is a good compromise solution - takes not much space, yet still fast enough.