first of all sorry for the bad English.
Well, I want to read the pieces hashes information from a torrent file. Currently, I'm using https://github.com/hyPiRion/java-bencode this bencode library to decode the information, but my problem is when I want to convert the string of pieces to a byte array. The torrent file is encoded in UTF-8. but If I do
Byte[] bytepieces = piecestring.getBytes("UTF-8");
It gives well. anything really usable.
For other side, for comparing or try to get the string, instead of getting the bytes, I've read the first piece of my file, and calculate the sha1. After getting the 20 sized byte array of sha1 if I convert it to string, effectively, the string matches the first part of the big string of pieces... But well, If I try to return that generated string, to the 20 originally bytes that created it ... I can't... how to do this?
Little example:
FileInputStream fin = new FileInputStream("miFile");
byte[] array = new Byte[512*1024]; //a piece of 512 kb
fin.read(array,0,512*1024);
MessageDigest md = MessageDigest.getInstanse ("SHA);
Byte [ sha1byte = md.digest(array);
String s = new String(sha1byte,"UTF-8");
After doing this, sha1byte.length is 20, and is OK, the correct size of a sha1 hash. But if i do
s.getBytes("UTF-8").length, in the case of my example i got... ¡33! ¡wuuut!
I want to get again from the generated string my 20 arrays. How to can I get this?
Well thanks :P
Thanks guys for your answer, but I can find the solution using this https://github.com/bedeho/bencodej
The lib loads the Bencode data alwais as bytearray with custom classes, and is able have a 1:1 with the bytestrings :p Thanks for all.
Bencode "strings" are sequences of bytes, not sequences of unicode codepoints. Therefore a language's representation of bytes -
byte[]
orByteBuffer
in java - is appropriate and should only be interpreted as utf8 string in certain cases when they actually contain things that are supposed to be human-readable.So you should use a bencoding library that supports extraction of the raw bytes.