Why doesn't ImageIO preserve JPEG data when re

2019-03-02 17:40发布

问题:

Originally I wanted to try whether it is possible to reconstruct redacted data from a JPEG image, given that it is a lossy image format and the pixel values spread into the neighbor pixels.

To test whether saving and loading JPEG images is reliable, I wrote the following program that repeatedly saves and loads a JPEG image until it reaches an image that has been seen before. Here is the code:

package de.roland_illig.jpg

import java.awt.image.BufferedImage
import java.io.ByteArrayInputStream
import java.io.ByteArrayOutputStream
import java.io.File
import java.nio.file.Files
import java.nio.file.Paths
import java.security.MessageDigest
import javax.imageio.ImageIO
import javax.xml.bind.DatatypeConverter

fun main(args: Array<String>) {

    fun loadJpeg(bytes: ByteArray) =
            ImageIO.read(ByteArrayInputStream(bytes))

    fun saveJpeg(img: BufferedImage) =
            ByteArrayOutputStream().apply { use { ImageIO.write(img, "jpg", it) } }.toByteArray()

    fun hash(bytes: ByteArray) =
            DatatypeConverter.printHexBinary(MessageDigest.getInstance("SHA-1").digest(bytes))

    var bytes = saveJpeg(ImageIO.read(File("000-original.png")))

    val log = mutableMapOf<String, Int>()
    for (n in 1..Int.MAX_VALUE) {
        Files.write(Paths.get("%03d.jpg".format(n)), bytes)
        val hash = hash(bytes)
        val prev = log.put(hash, n)
        if (prev != null) {
            println("After $n steps, the image is the same as after $prev steps.")
            break
        }

        bytes = saveJpeg(loadJpeg(bytes))
    }
}

The funny thing is that for a random screenshot, it takes between 20 and 49 steps until the image becomes stable. Ideally I would have expected always 2 steps.

Even though JPEG is a lossy format, after saving it and loading it again, each pixel has a certain value. Whatever compression is used, when compressing the same data again, I had expected that the compressed data is also the same:

val original = loadPng()              // Exact in-memory image
val jpeg0Bytes = saveJpeg(original)   // Saved with JPEG artifacts
val jpeg = loadJpeg(jpeg0Bytes)       // Lossy, loaded again
val jpeg1Bytes = saveJpeg(jpeg)       // Should be the same as jpeg0Bytes

I only tried with the default quality settings of Java's ImageIO, but a manual experiment with GIMP showed similar results.

Now I wonder why the image libraries don't implement the JPEG compression so that the above program would stop after 2 steps. Is it really so hard to eliminate rounding errors or whatever else might create these artifacts?