Convert Binary Strings (ASCII) to Binary File

2019-09-06 10:07发布

问题:

I have several large files (3-6 Gb) of 1's and 0's characters in ASCII and I would like to convert it to a simply binary file. Newlines are not important and should be discarded.

test.bin below is 568 bytes, I would like the 560 bit file.

0111000110000000101000100000100100011111010010101000001001010000111000
1001100011010100001101110000100010000010000000000001011000010011111100
0100001000010000010000010111011101011111000111111000111001100010100011
0011101000100001111111000001111110111111101101100000011000010101100001
0000000110110001000000000001000011110100000101101000001000010001010011
1101101111010101011110001110000010011001100101101101000111111101110101
1000001100101101010111110111110101100000000011001000100000000011001110
0101101001110010011110000100101001001111010011100100001001111111100110
...

I've found several solutions going the other way, converting a binary file into ASCII but not the other way.

Ideally I'm looking for a simple linux / bash solution but I could live with a python solution. =================== Edit ==================

To make this less confusing consider converting any two ASCII characters into a binary file.

test_XY_encoded.txt

XYYYXXXYYXXXXXXXYXYXXXYXXXXXYXXYXXXYYYYYXYXXYXYXYXXXXXYXXYXYXXXXYYYXXX
YXXYYXXXYYXYXYXXXXYYXYYYXXXXYXXXYXXXXXYXXXXXXXXXXXXYXYYXXXXYXXYYYYYYXX
XYXXXXYXXXXYXXXXXYXXXXXYXYYYXYYYXYXYYYYYXXXYYYYYYXXXYYYXXYYXXXYXYXXXYY
XXYYYXYXXXYXXXXYYYYYYYXXXXXYYYYYYXYYYYYYYXYYXYYXXXXXXYYXXXXYXYXYYXXXXY
XXXXXXXYYXYYXXXYXXXXXXXXXXXYXXXXYYYYXYXXXXXYXYYXYXXXXXYXXXXYXXXYXYXXYY
YYXYYXYYYYXYXYXYXYYYYXXXYYYXXXXXYXXYYXXYYXXYXYYXYYXYXXXYYYYYYYXYYYXYXY
YXXXXXYYXXYXYYXYXYXYYYYYXYYYYYXYXYYXXXXXXXXXYYXXYXXXYXXXXXXXXXYYXXYYYX
XYXYYXYXXYYYXXYXXYYYYXXXXYXXYXYXXYXXYYYYXYXXYYYXXYXXXXYXXYYYYYYYYXXYYX

Where X represents the binary 0 and Y represents the binary 1.

回答1:

How about this bash command?

cat test.bin | tr -d '\n' | perl -lpe '$_=pack"B*",$_' > true_binary.txt

'tr' will delete all newline characters, and the perl command converts to binary.



回答2:

I don't know if this would solve the question, but how about this:

with open('ascii.txt', 'r') as file_ascii, open('binary.txt', 'wb') as file_bin:
    file_bin.write(bytes(''.join(file_ascii.read().split()), 'utf-8'))

Or, to overwrite the file:

with open('ascii.txt', 'r') as f:
    binary = bytes(''.join(file_ascii.read().split()), 'utf-8')

with open('ascii.txt', 'wb') as f:
    f.write(binary)

Short, but should do the trick.



回答3:

We could build an "only shell" solution.
First, we transform the 1's and 0's to an stream of 8 characters lines:

$ { cat test.bin | tr -cd '01' | fold -b8; echo; }
01110001
10000000
10100010
00001001
00011111
…
…
10011110
00010010
10010011
11010011
10010000
10011111
11100110

That's 560/8 lines, or 70 lines, which should translate to 70 characters.
It should be said that the characters are not ASCII, values above decimal 127 (hex 7f) are not ASCII. I am interpreting them as byte values (unsigned decimal value).

Then we can read each line and translate it first to decimal "$((2#$a))" so the shell understand them, then to hex printf '\\x%x' so the final printf could translate to an hex byte printf '%b' "…":

$ { cat infile | tr -cd '01' | fold -b8; echo; } | 
    while read a; do printf '%b' "$(printf '\\x%x' "$((2#$a))")"; done 
q��     J�P�cP�XO�!u���(Έ�큅a���OoU�f[G�X2���Ȁ3����Ӑ��

Of course, the characters printed are a (most probably) incorrect interpretation of the byte values in some locale that the user is using. Maybe an hex output will be more interesting (but that depends on your needs or interest):

$ { cat infile | tr -cd '01' | fold -b8; echo; } | 
    while read a; do printf '%b' "$(printf '\\x%x' "$((2#$a))")"; done |
        od -vAn -tx1c

  71  80  a2  09  1f  4a  82  50  e2  63  50  dc  22  08  00  58
   q 200 242  \t 037   J 202   P 342   c   P 334   "  \b  \0   X
  4f  c4  21  04  17  75  f1  f8  e6  28  ce  88  7f  07  ef  ed
   O 304   ! 004 027   u 361 370 346   ( 316 210 177  \a 357 355
  81  85  61  01  b1  00  10  f4  16  82  11  4f  6f  55  e3  82
 201 205   a 001 261  \0 020 364 026 202 021   O   o   U 343 202
  66  5b  47  f7  58  32  d5  f7  d6  00  c8  80  33  96  9c  9e
   f   [   G 367   X   2 325 367 326  \0 310 200   3 226 234 236
  12  93  d3  90  9f  e6
 022 223 323 220 237 346

Note that the same structure could be used for the file test_XY_encoded.txt:

$ { cat infile | tr 'XY' '01' | tr -cd '01' | fold -b8; echo; } | 
    while read a; do printf '%b' "$(printf '\\x%x' "$((2#$a))")"; done | 
        od -vAn -tx1c

  71  80  a2  09  1f  4a  82  50  e2  63  50  dc  22  08  00  58
   q 200 242  \t 037   J 202   P 342   c   P 334   "  \b  \0   X
  4f  c4  21  04  17  75  f1  f8  e6  28  ce  88  7f  07  ef  ed
   O 304   ! 004 027   u 361 370 346   ( 316 210 177  \a 357 355
  81  85  61  01  b1  00  10  f4  16  82  11  4f  6f  55  e3  82
 201 205   a 001 261  \0 020 364 026 202 021   O   o   U 343 202
  66  5b  47  f7  58  32  d5  f7  d6  00  c8  80  33  96  9c  9e
   f   [   G 367   X   2 325 367 326  \0 310 200   3 226 234 236
  12  93  d3  90  9f  e6
 022 223 323 220 237 346