I have data in binary format (hex: 80 3b c8 87 0a 89
) and I need to convert that into String in order to save binary data in MS Access db via Jackcess. I know, that I'm not suppose to use String in Java for binary data, however Access db is third party product and I have not control whatsoever.
So I tried to convert binary data and save it, but unfortunately the result was unexpected.
byte[] byteArray = new byte[] {0x80, 0x3b, 0xc8, 0x87, 0x0a 0x89};
System.out.println(String.format("%02X ",byteArray[0])+String.format("%02X ", byteArray[1]));//gives me the same values
String value = new String(byteArray, "UTF-8");//or any other encoding
System.out.println(value);//completely different values
I would like to know what going on under new String
and if there is a way to convert binary data into String and have the same hex values.
Note 1: initially I read a binary file which has nothing to do with hex. I use hex just for comparison of datasets.
Note 2 There was a suggestion to use Base64 aka MIME, UTF-7, etc. By my understanding, it takes binary data and encodes that into ANSI charset, basically tweaking initial data. However,for me that is not a solution, because I must write exact data that I hold in binary array.
byte[] byteArray = new byte[]{0x2f, 0x7a, 0x2d, 0x28};
byte[] bytesEncoded = Base64.encodeBase64(byteArray);
System.out.println("encoded value is " + new String(bytesEncoded ));//new data
The basic lesson to be taken - never mix up binary data with String equivalent.
My mistake was, that I exported initial data from Access into csv, while changing type of the index field from binary to String (total mess, now I know). The solution that I came - my own export tool from Access, where all data is kept as binary. Thanks to @gord-thompson - his comment led to the solution.
In order to safely convert arbitrary binary data into text, you should use something like hex or base64. Encodings such as UTF-8 are meant to encode arbitrary text data as bytes, not to encode arbitrary binary data as text. It's a difference in terms of what the source data is.
I would strongly recommend using a library for this. For example, with Guava:
(Other libraries are available, of course, such as Apache Commons Codec.)
Alternatively, save your binary data into a field in Access which is designed for binary data, instead of converting it to text at all.