My idea is to make a little software that reads a file (which can't be read "naturally", but it contains some images), turns its data into hex, looks for the PNG chunks (a kind of marks that are at the beginning and end of a .png file), and saves the resulting data in different files (after getting it back from hex). I am doing this in Java, using a code like this:
// out is where to show the result and file is the source
public static void hexDump(PrintStream out, File file) throws IOException {
InputStream is = new FileInputStream(file);
StringBuffer Buffer = new StringBuffer();
while (is.available() > 0) {
StringBuilder sb1 = new StringBuilder();
for (int j = 0; j < 16; j++) {
if (is.available() > 0) {
int value = (int) is.read();
// transform the current data into hex
sb1.append(String.format("%02X ", value));
}
}
Buffer.append(sb1);
// Should I look for the PNG here? I'm not sure
}
is.close();
// Print the result in out (that may be the console or a file)
out.print(Buffer);
}
I'm sure there are another ways to do this using less "machine-resources" while opening huge files. If you have any idea, please tell me. Thanks!
This is the first time I post, so if there is any error, please help me to correct it.
As Erwin Bolwidt says in the comments, first thing is don't convert to hex. If for some reason you must convert to hex, quit appending the content to two buffers, and always use StringBuilder, not StringBuffer. StringBuilder can be as much as 3x faster than StringBuffer.
Also, buffer your file reads with BufferedReader. Reading one character at a time with
FileInputStream.read()
is very slow.Reading the file a byte at a time would be taking substantial time here. You can improve that by orders of magnitude. You should be using a
DataInputStream
around aBufferedInputStream
around theFileInputStream
, and reading 16 bytes at a time withreadFully.
And then processing them, without conversion to and from hex, which is quite unnecessary here, and writing them to the output(s) as you go, via a
BufferedOutputStream
around theFileOutputStream,
rather than concatenating the entire file into memory and having to write it all out in one go. Of course that takes time, but that's because it does, not because you have to do it that way.A very simple way to do this, which is probably quite fast, is to read the entire file into memory (as binary data, not as a hex dump) and then search for the markers.
This has two limitations:
The basic code to do that is like this:
I'm not sure how these PNG chunk markers work exactly, I'm assuming above that they start the section of the data that you're interested in, and that the next marker starts the next section of the data.
There are two things missing in standard Java: code to convert a hex string to a byte array and code to search for a byte array inside another byte array. Both can be found in various apache-commons libraries but I'll include that answers the people posted to earlier questions on StackOverflow. You can copy these verbatim into the Png class to make the above code work.
Convert a string representation of a hex dump to a byte array using Java?
Searching for a sequence of Bytes in a Binary File with Java
I modified this last piece of code to make it possible to start the search at an offset other than zero.