Python - Turn a file content into a binary array

2020-08-13 05:18发布

问题:

File content:

40 13 123
89 123 2223
4  12  0

I need to store the whole .txt file as a binary array so that I can send it later to the server side which expects a binary input.


I've looked at Python's bytearray documentation. I quote:

Return a new array of bytes. The bytearray type is a mutable sequence of integers in the range 0 <= x < 256. It has most of the usual methods of mutable sequences, described in Mutable Sequence Types, as well as most methods that the bytes type has, see Bytes and Byte Array Methods.


My numbers are greater than 256, I need a bytearray data structure for numbers that are greater than 256.

回答1:

you might use the array/memoryview approach

import array
a = array.array('h', [10, 20, 300]) #assume that the input are short signed integers
memv = memoryview(a)
m = memv.cast('b') #cast to bytes
m.tolist()

this then gives [10, 0, 20, 0, 44, 1]

Depending on the usage, one might also do:

L = array.array('h', [10, 20, 300]).tostring()
list(map(ord, list(L)))

this also gives [10, 0, 20, 0, 44, 1]



回答2:

You can read in the text file and convert each 'word' to an int:

with open(the_file, 'r') as f:
    lines = f.read_lines()
    numbers = [int(w) for line in lines for w in line.split()]

Then you have to pack numbers into a binary array with struct:

binary_representation = struct.pack("{}i".format(len(numbers)), *numbers)

If you want these data to be written in binary format, you have to specify so when opening the target file:

with open(target_file, 'wb') as f:
   f.write(binary_representation)


回答3:

Not bytearray

From the bytearray documentation, it is just a sequence of integers in the range 0 <= x < 256.

As an example, you can initialize it like this :

bytearray([40,13,123,89,123,4,12,0])
# bytearray(b'(\r{Y{\x04\x0c\x00')

Since integers are already stored in binary, you don't need to convert anything.

Your problem now becomes : what do you want to do with 2223 ?

>>> bytearray([2223])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: byte must be in range(0, 256)

uint32 or int32 array?

To read one file, you could use :

import re
with open('test.txt') as f:
    numbers = [int(w) for line in f for w in re.split(' +', line)]
    print numbers
    #[40, 13, 123, 89, 123, 2223, 4, 12, 0]

Once you have an integer list, you could choose the corresponding low-level Numpy data structure, possibly uint32 or int32.



回答4:

I needed this for a thrift server-client module, which one of its function required a binary input. Different thrift types can be found here.

Client

myList = [5, 999, 430, 0]
binL = array.array('l', myList).tostring()
# call function with binL as parameter

In Server I reconstructed the list

k = list(array.array('l', binL))
print(k)
[5, 999, 430, 0]


回答5:

Try this:

input.txt:

40 13 123
89 123 2223
4  12  0

Code to parse input to output:

with open('input.txt', 'r') as _in:
    nums = map(bin, map(int, _in.read().split())) # read in the whole file, split it into a list of strings, then convert to integer, the convert to binary string

with open('output.txt', 'w') as out:
          out.writelines(map(lambda b: b + '\n', map(lambda n: n.replace('0b', ''), nums))) # remove the `0b` head from the binstrings, then append `\n` to every string in the list, then write to file

output.txt:

101000
1101
1111011
1011001
1111011
100010101111
100
1100
0

Hope it helps.