I have found out that calculating sha256 in java is slow. For example, it is slower than python. I wrote two simple benchmarks that calculate sha256 of 1GB of zeroes. In both cases the result is the same and correct, but the python time is 5653ms and the java time is 8623ms(53% slower). The result is similar every time and this is an important difference for me.
How to make the calculation in java faster?
Benchmarks:
Java:
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
public class BenchmarkSha256 {
public static void main(String... args) throws NoSuchAlgorithmException {
int size = 1024 * 1024;
byte[] bytes = new byte[size];
MessageDigest md = MessageDigest.getInstance("SHA-256");
long startTime = System.nanoTime();
for (int i = 0; i < 1024; i++)
md.update(bytes, 0, size);
long endTime = System.nanoTime();
System.out.println(String.format("%1$064x", new java.math.BigInteger(1, md.digest())));
System.out.println(String.format("%d ms", (endTime - startTime) / 1000000));
}
}
Python:
#!/usr/bin/env python
import hashlib
import time
size = 1024 * 1024
bytes = bytearray(size)
md = hashlib.sha256()
startTime = time.time()
for i in range(0, 1024):
md.update(bytes)
endTime = time.time()
print "%s\n%d ms" % (md.hexdigest(), (endTime - startTime) * 1000)
results:
~> java BenchmarkSha256
49bc20df15e412a64472421e13fe86ff1c5165e18b2afccf160d4dc19fe68a14
8623 ms
~> python BenchmarkSha256.py
49bc20df15e412a64472421e13fe86ff1c5165e18b2afccf160d4dc19fe68a14
5653 ms
versions of java and python:
~> java -version
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
~> python --version
Python 2.7
Have you tried feeding in the data incrementally? You can use messageDigest.update()
with the bytes and then get the final digest with messageDigest.digest()
?
Allocating a 1GB array in memory is a fairly chunky operation. You may find that smaller incremental updates are faster in the end.
Well, unless you are doing this to compare two command line programs, this is not the best test. Primarily, these numbers are being polluted by the vast differences in overhead associated with each program. VM start times will vary. Memory allocation speeds will vary.
To clean this up a bit, simply take two time samples before and after each actual MD5 calculation within the code itself.
This will actually measure performance of the hashing operation itself.
While you might be able to improve the performance of the Java tool a bit, the Python implementation will usually be faster because it is likely delegating to assembled libraries which run with significantly better performance.
If your project does not have any other significant dependencies on Java, I'd recommend going with the Python implementation.
I ran a test on the following SHA-256 implementations: Java built-in, Groovy built-in, Apache Commons, Guava, and Bouncy Castle. My results on one run are here:
>groovy hash_comp.groovy
Hashing 1000000 iterations of SHA-256
time java: 2688 372023.8095238095 hashes/sec
time groovy: 1948 513347.0225872690 hashes/sec
time apache: 867 1153402.5374855825 hashes/sec
time guava: 953 1049317.9433368311 hashes/sec
time bouncy: 1890 529100.5291005291 hashes/sec
This was run on an Intel i5 8th Gen. Apache and Guava were easily to two fastest implementations. Apache Commons narrowly beat out Guava on 9/10 of my runs. My code for this test is available here.
Note that after running this test I started to wonder if you could go even faster by tapping into the CPU instruction set (Intel has SHA extensions). I'm not sure there is a JVM way to do this without JNI or JNA. I created another question here.
Update: Another option I found is the Amazon Corretto Crypto Provider (ACCP). Code available here.
What exactly is ACCP?
ACCP implements the standard Java Cryptography Architecture (JCA) interfaces and replaces the default Java cryptographic implementations with those provided by libcrypto from the OpenSSL project. ACCP allows you to take full advantage of assembly-level and CPU-level performance tuning, to gain significant cost reduction, latency reduction, and higher throughput across multiple services and products, as shown in the examples below.