How do I write a long integer as binary in Python?

2019-02-17 01:19发布

问题:

In Python, long integers have unlimited precision. I would like to write a 16 byte (128 bit) integer to a file. struct from the standard library supports only up to 8 byte integers. array has the same limitation. Is there a way to do this without masking and shifting each integer?

Some clarification here: I'm writing to a file that's going to be read in from non-Python programs, so pickle is out. All 128 bits are used.

回答1:

Two possible solutions:

  1. Just pickle your long integer. This will write the integer in a special format which allows it to be read again, if this is all you want.

  2. Use the second code snippet in this answer to convert the long int to a big endian string (which can be easily changed to little endian if you prefer), and write this string to your file.

The problem is that the internal representation of bigints does not directly include the binary data you ask for.



回答2:

I think for unsigned integers (and ignoring endianness) something like

import binascii

def binify(x):
    h = hex(x)[2:].rstrip('L')
    return binascii.unhexlify('0'*(32-len(h))+h)

>>> for i in 0, 1, 2**128-1:
...     print i, repr(binify(i))
... 
0 '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
1 '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01'
340282366920938463463374607431768211455 '\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'

might technically satisfy the requirements of having non-Python-specific output, not using an explicit mask, and (I assume) not using any non-standard modules. Not particularly elegant, though.



回答3:

The PyPi bitarray module in combination with the builtin bin() function seems like a good combination for a solution that is simple and flexible.

bytes = bitarray(bin(my_long)[2:]).tobytes()

The endianness can be controlled with a few more lines of code. You'll have to evaluate the efficiency.



回答4:

Why not use struct with the unsigned long long type twice?

import struct
some_file.write(struct.pack("QQ", var/(2**64), var%(2**64)))

That's documented here (scroll down to get the table with Q): http://docs.python.org/library/struct.html



回答5:

This may not avoid the "mask and shift each integer" requirement. I'm not sure that avoiding mask and shift means in the context of Python long values.

The bytes are these:

def bytes( long_int ):
    bytes = []
    while long_int != 0:
        b = long_int%256
        bytes.insert( 0, b )
        long_int //= 256
    return bytes

You can then pack this list of bytes using struct.pack( '16b', bytes )



回答6:

You could pickle the object to binary, use protocol buffers (I don't know if they allow you to serialize unlimited precision integers though) or BSON if you do not want to write code.

But writing a function that dumps 16 byte integers by shifting it should not be so hard to do if it's not time critical.



回答7:

This may be a little late, but I don't see why you can't use struct:

bigint = 0xFEDCBA9876543210FEDCBA9876543210L
print bigint,hex(bigint).upper()

cbi = struct.pack("!QQ",bigint&0xFFFFFFFFFFFFFFFF,(bigint>>64)&0xFFFFFFFFFFFFFFFF)

print len(cbi)

The bigint by itself is rejected, but if you mask it with &0xFFFFFFFFFFFFFFFF you can reduce it to an 8 byte int instead of 16. Then the upper part is shifted and masked as well. You may have to play with byte ordering a bit. I used the ! mark to tell it to produce a network endian byte order. Also, the msb and lsb (upper and lower bytes) may need to be reversed. I will leave that as an exercise for the user to determine. I would say saving things as network endian would be safer so you always know what the endianess of your data is.

No, don't ask me if network endian is big or little endian...



回答8:

With Python 3.2 and later, you can use int.to_bytes and int.from_bytes: https://docs.python.org/3/library/stdtypes.html#int.to_bytes