I have a bunch of binary data that comes to python via a char* from some C interface (not under my control) so I have a string of arbitrary binary data (what is normally a byte array). I would like to convert it to a byte array to simplify using it with other python functions but I can't seem to figure out how.
Examples that don't work:
data = rawdatastr.encode()
this assumes "utf-8" and mangles the data == BAD
data = rawdatastr.encode('ascii','ignore')
strips chars over 127 == BAD
data = rawdatastr.encode('latin1')
not sure -- this is the closest so far but I have no proof that it is working for all bytes.
data = array.array('B', [x for x in map(ord,data)]).tobytes()
This works but seems like a lot of work to do something simple. Is there something simpler?
I am thinking I need to write my own identity encoding that just passes the bytes along (I think latin1 does this based upon some reading but no proof thus far).
Though I suspect something else is decoding your data for you (a
char*
in C is usually best represented asbytes
, especially if it is binary data):The
latin1
codec can round trip every byte. You can verify this with the following short program:You can simply
encode('iso-8859-15')
Use base64:
encoded variable is still bytes type, but now it has only printable ASCII characters, so You can encode them using 'uts-8'.
Just now I ran into the same problem. This is what I came up with:
Some examples: