My understanding is that os.urandom(size) outputs a random string of bytes of the given "size", but then:
import os
import sys
print(sys.getsizeof(os.urandom(42)))
>>>
75
Why is this not 42?
And a related question:
import base64
import binascii
print(sys.getsizeof(base64.b64encode(os.urandom(42))))
print(sys.getsizeof(binascii.hexlify(os.urandom(42))))
>>>
89
117
Why are these so different? Which encoding would be the most memory efficient way to store a string of bytes such as that given by os.urandom?
Edit: It seems like quite a stretch to say that this question is a duplicate of What is the difference between len() and sys.getsizeof() methods in python? My question is not about the difference between len() and getsizeof(). I was confused about the memory used by Python objects in general, which the answer to this question has clarified for me.
Python byte string objects are more than just the characters that comprise them. They are fully fledged objects. As such they require more space to accommodate the object's components such as the type pointer (needed to identify what kind of object the bytestring even is) and the length (needed for efficiency and because Python bytestrings can contain null bytes).
The simplest object, an
object
instance, requires space:The second part of your question is simply because the strings produced by
b64encode()
andhexlify()
have different lengths; the latter being 28 characters longer which, unsurprisingly, is the difference in the values reported bysys.getsizeof()
.Unless you use some form of compression, there is no encoding that will be more efficient than the binary string that you already have, and this is particularly true in this case because the data is random which is inherently incompressible.