There seems to be a problem with getting deterministic hash values for the POI XLSX format, with MessageDigest SHA-256 implementation, even for empty ByteArray streams. This happens randomly, after several hundreds or even only thousands of iterations.
The relevant code snippets used to reproduce the problem:
// TestNG FileTest:
@Test(enabled = true) // indeterminism at random iterations, such as 400 or 1290
public void emptyXLSXTest() throws IOException, NoSuchAlgorithmException {
final Hasher hasher = new HasherImpl();
boolean differentSHA256Hash = false;
for (int i = 0; i < 10000; i++) {
final ByteArrayOutputStream excelAdHoc1 = BusinessPlanInMemory.getEmptyExcel("xlsx");
final ByteArrayOutputStream excelAdHoc2 = BusinessPlanInMemory.getEmptyExcel("xlsx");
byte[] expectedByteArray = excelAdHoc1.toByteArray();
String expectedSha256 = hasher.sha256(expectedByteArray);
byte[] actualByteArray = excelAdHoc2.toByteArray();
String actualSha256 = hasher.sha256(actualByteArray);
if (!expectedSha256.equals(actualSha256)) {
differentSHA256Hash = true;
System.out.println("ITERATION: " + i);
System.out.println("EXPECTED HASH: " + expectedSha256);
System.out.println("ACTUAL HASH: " + actualSha256);
break;
}
}
Assert.assertTrue(differentSHA256Hash, "Indeterminism did not occur");
}
Referenced Hasher and POI code:
// HasherImpl class:
public String sha256(final InputStream stream) throws IOException, NoSuchAlgorithmException {
final MessageDigest digest = MessageDigest.getInstance("SHA-256");
final byte[] bytesBuffer = new byte[300000];
int bytesRead = -1;
while ((bytesRead = stream.read(bytesBuffer)) != -1) {
digest.update(bytesBuffer, 0, bytesRead);
}
final byte[] hashedBytes = digest.digest();
return bytesToHex(hashedBytes);
}
Tried to eliminate indeterminism due to meta data like creation time, to no avail:
// POI BusinessPlanInMemory helper class:
public static ByteArrayOutputStream getEmptyExcel(final String fileextension) throws IOException {
Workbook wb;
if (fileextension.equals("xls")) {
wb = new HSSFWorkbook();
}
else {
wb = new XSSFWorkbook();
final POIXMLProperties props = ((XSSFWorkbook) wb).getProperties();
final POIXMLProperties.CoreProperties coreProp = props.getCoreProperties();
coreProp.setCreated("");
coreProp.setIdentifier("1");
coreProp.setModified("");
}
wb.createSheet();
final ByteArrayOutputStream excelStream = new ByteArrayOutputStream();
wb.write(excelStream);
wb.close();
return excelStream;
}
The HSSF / XLS format seems not to be affected by the problem described. Does anybody have a clue, what could be causing this, if not a bug in POI itself? Basically, the code above refers to https://poi.apache.org/spreadsheet/examples.htmlBusinessPlan example
Thanks for your input!