Memory efficient way to store 32 bit signed intege

2020-07-06 08:23发布

问题:

Since Redis try to parse strings to 64 bit signed integers, is it a good idea to store binary representation of 32 bit signed integer instead of radix 10 integer strings ?

In our system we have lists of many 32 bit signed integer IDs.

I can store them like
lpush mykey 102450  --> redis cast 102450 to 8 bytes long

or store it like 
lpush mykey  \x00\x01\x19\x32  ---> this is just 4 bytes

回答1:

Internally, Redis stores strings in the most efficient manner. Forcing integers into radix 10 strings will actually use more memory.

Here is how Redis stores Strings -

  1. Integers less than 10000 are stored in a shared memory pool, and don't have any memory overheads. If you wish, you can increase this limit by changing the constant REDIS_SHARED_INTEGERS in redis.h and recompiling Redis.
  2. Integers greater than 10000 and within range of a long consume 8 bytes.
  3. Regular strings take len(string) + 4 bytes for length + 4 bytes for marking free space + 1 byte for null terminator + 8 bytes for malloc overheads.

In the example you quoted, its a question of 8 bytes for a long v/s 21 bytes for the string.

EDIT :

So if I have a set of numbers all less than 10,000 how does Redis store my set?

It depends on how many elements you have.

If you have less than 512 elements in your set (see set-max-intset-entries), then the set will be stored as an IntSet. An IntSet is a glorified name for a Sorted Integer Array. Since your numbers are less than 10000, it would use 16 bits per element. It is (almost) as memory efficient as a C array.

If you have more than 512 elements, the set becomes a HashTable. Each element in the set is wrapped in a structure called robj, which has an overhead of 16 bytes. The robj structure has a pointer to the shared pool of integers, so you don't pay anything extra for the integer itself. And finally, the robj instances are stored in the hashtable, and the hashtable has an overhead that is proportional to the size of the set.

If you are interested in exactly how much memory an element consumes, run redis-rdb-tools on your dataset (disclaimer: I am the author of this tool). Or you can read the sourcecode for the class MemoryCallback, the comments explain how the memory is laid out.



回答2:

Strings are stored with a length, so it won't be just 4 bytes in the database -- it's probably stored as 4 bytes data + 4 bytes length + padding, so you don't gain anything.



标签: redis