I have several large files (3-6 Gb) of 1's and 0's characters in ASCII and I would like to convert it to a simply binary file. Newlines are not important and should be discarded.
test.bin below is 568 bytes, I would like the 560 bit file.
0111000110000000101000100000100100011111010010101000001001010000111000
1001100011010100001101110000100010000010000000000001011000010011111100
0100001000010000010000010111011101011111000111111000111001100010100011
0011101000100001111111000001111110111111101101100000011000010101100001
0000000110110001000000000001000011110100000101101000001000010001010011
1101101111010101011110001110000010011001100101101101000111111101110101
1000001100101101010111110111110101100000000011001000100000000011001110
0101101001110010011110000100101001001111010011100100001001111111100110
...
I've found several solutions going the other way, converting a binary file into ASCII but not the other way.
Ideally I'm looking for a simple linux / bash solution but I could live with a python solution. =================== Edit ==================
To make this less confusing consider converting any two ASCII characters into a binary file.
test_XY_encoded.txt
XYYYXXXYYXXXXXXXYXYXXXYXXXXXYXXYXXXYYYYYXYXXYXYXYXXXXXYXXYXYXXXXYYYXXX
YXXYYXXXYYXYXYXXXXYYXYYYXXXXYXXXYXXXXXYXXXXXXXXXXXXYXYYXXXXYXXYYYYYYXX
XYXXXXYXXXXYXXXXXYXXXXXYXYYYXYYYXYXYYYYYXXXYYYYYYXXXYYYXXYYXXXYXYXXXYY
XXYYYXYXXXYXXXXYYYYYYYXXXXXYYYYYYXYYYYYYYXYYXYYXXXXXXYYXXXXYXYXYYXXXXY
XXXXXXXYYXYYXXXYXXXXXXXXXXXYXXXXYYYYXYXXXXXYXYYXYXXXXXYXXXXYXXXYXYXXYY
YYXYYXYYYYXYXYXYXYYYYXXXYYYXXXXXYXXYYXXYYXXYXYYXYYXYXXXYYYYYYYXYYYXYXY
YXXXXXYYXXYXYYXYXYXYYYYYXYYYYYXYXYYXXXXXXXXXYYXXYXXXYXXXXXXXXXYYXXYYYX
XYXYYXYXXYYYXXYXXYYYYXXXXYXXYXYXXYXXYYYYXYXXYYYXXYXXXXYXXYYYYYYYYXXYYX
Where X represents the binary 0 and Y represents the binary 1.
I don't know if this would solve the question, but how about this:
Or, to overwrite the file:
Short, but should do the trick.
How about this bash command?
'tr' will delete all newline characters, and the perl command converts to binary.
We could build an "only shell" solution.
First, we transform the 1's and 0's to an stream of 8 characters lines:
That's 560/8 lines, or 70 lines, which should translate to 70 characters.
It should be said that the characters are not ASCII, values above decimal 127 (hex 7f) are not ASCII. I am interpreting them as byte values (unsigned decimal value).
Then we can read each line and translate it first to decimal
"$((2#$a))"
so the shell understand them, then to hexprintf '\\x%x'
so the final printf could translate to an hex byteprintf '%b' "…"
:Of course, the characters printed are a (most probably) incorrect interpretation of the byte values in some locale that the user is using. Maybe an hex output will be more interesting (but that depends on your needs or interest):
Note that the same structure could be used for the file
test_XY_encoded.txt
: