Originally I wanted to try whether it is possible to reconstruct redacted data from a JPEG image, given that it is a lossy image format and the pixel values spread into the neighbor pixels.
To test whether saving and loading JPEG images is reliable, I wrote the following program that repeatedly saves and loads a JPEG image until it reaches an image that has been seen before. Here is the code:
package de.roland_illig.jpg
import java.awt.image.BufferedImage
import java.io.ByteArrayInputStream
import java.io.ByteArrayOutputStream
import java.io.File
import java.nio.file.Files
import java.nio.file.Paths
import java.security.MessageDigest
import javax.imageio.ImageIO
import javax.xml.bind.DatatypeConverter
fun main(args: Array<String>) {
fun loadJpeg(bytes: ByteArray) =
ImageIO.read(ByteArrayInputStream(bytes))
fun saveJpeg(img: BufferedImage) =
ByteArrayOutputStream().apply { use { ImageIO.write(img, "jpg", it) } }.toByteArray()
fun hash(bytes: ByteArray) =
DatatypeConverter.printHexBinary(MessageDigest.getInstance("SHA-1").digest(bytes))
var bytes = saveJpeg(ImageIO.read(File("000-original.png")))
val log = mutableMapOf<String, Int>()
for (n in 1..Int.MAX_VALUE) {
Files.write(Paths.get("%03d.jpg".format(n)), bytes)
val hash = hash(bytes)
val prev = log.put(hash, n)
if (prev != null) {
println("After $n steps, the image is the same as after $prev steps.")
break
}
bytes = saveJpeg(loadJpeg(bytes))
}
}
The funny thing is that for a random screenshot, it takes between 20 and 49 steps until the image becomes stable. Ideally I would have expected always 2 steps.
Even though JPEG is a lossy format, after saving it and loading it again, each pixel has a certain value. Whatever compression is used, when compressing the same data again, I had expected that the compressed data is also the same:
val original = loadPng() // Exact in-memory image
val jpeg0Bytes = saveJpeg(original) // Saved with JPEG artifacts
val jpeg = loadJpeg(jpeg0Bytes) // Lossy, loaded again
val jpeg1Bytes = saveJpeg(jpeg) // Should be the same as jpeg0Bytes
I only tried with the default quality settings of Java's ImageIO, but a manual experiment with GIMP showed similar results.
Now I wonder why the image libraries don't implement the JPEG compression so that the above program would stop after 2 steps. Is it really so hard to eliminate rounding errors or whatever else might create these artifacts?