Speed difference between AES/CBC encryption and de

2019-03-30 15:00发布

问题:

I am wondering, theoretically, how much slower would AES/CBC decryption be compared to AES/CBC encryption with the following conditions:

  • Encryption key of 32 bytes (256 bits);
  • A blocksize of 16 bytes (128 bits).

The reason that I ask is that I want to know if the decryption speed of an implementation that I have is not abnormally slow. I have done some tests on random memory blocks of different sizes. The results are as follows:

64B:

64KB:

10MB – 520MB:

All data was stored on the internal memory of my system. The application generates the data to encrypt by itself. Virtual memory is disabled on the test PC so that there would not be any I/O calls.

When analyzing the table, does the difference between encryption and decryption imply that my implementation is abnormally slow? Have I done something wrong?

Update:

  • This test is executed on another pc;
  • This test is executed with random data;
  • Crypto++ is used for the AES/CBC encryption and decryption.

The decryption implementation is as follows:

CryptoPP::AES::Decryption aesDecryption(aesKey, ENCRYPTION_KEY_SIZE_AES);
CryptoPP::CBC_Mode_ExternalCipher::Decryption cbcDecryption(aesDecryption, aesIv);

CryptoPP::ArraySink * decSink = new CryptoPP::ArraySink(data, dataSizeMax);
CryptoPP::StreamTransformationFilter stfDecryptor(cbcDecryption, decSink);
stfDecryptor.Put(reinterpret_cast<const unsigned char*>(ciphertext), cipherSize);
stfDecryptor.MessageEnd();

*dataOutputSize = decSink->TotalPutLength(); 

Update 2:

  • Added result for 64 byte blocks

回答1:

As symmetric encryption, encryption and decryption should be fairly close in speed. Not sure about your implementation but there are ways to optimize if you're concerned about how the algorithm was used. In experiments, AES is not the fastest and CBC will add security but slow it down. Here's a comparison, since you're asking about key and block sizes:



回答2:

When analyzing the table, does the difference between encryption and decryption imply that my implementation is abnormally slow? Have I done something wrong?

Three or four things jump out at me. I kind of agree with @JamesKPolk - the numbers look off. First, crypto libraries are usually bench-marked with CTR mode, not CBC mode. Also see the SUPERCOP benchmarks. Any you have to use cycles-per-byte (cpb) to normalize measurement units across machines. Saying "9 MB/s" without a context means nothing.

Second, we need to know the machine and its CPU frequency. It looks like you are pushing data at 9 MB/s for encryption and 6.5 MB/s for decryption. A modern iCore machine, like a Core-i5 running at 2.7 GHz, will push CBC mode data at around 2.5 or 3.0 cpb, which is roughly 980 MB/s or 1 GB/s. Even my old Core2 Duo running at 2.0 GHz moves data faster than you are showing. The Core 2 moves data at 14.5 cpb or 130 MB/s.

Third, scrap this code. There is a lot of room for improvement so it is not worth critiquing in this context; and the recommended code is below. Worth mentioning is, you are creating a lot of objects like ArraySource and StreamTransformationFilter. The filter adds padding which perturbs the AES encryption and decryption benchmarks and skews the results. You only need an encryption object, and then you only need to call ProcessBlock or ProcessString.

CryptoPP::AES::Decryption aesDecryption(aesKey, ENCRYPTION_KEY_SIZE_AES);
CryptoPP::CBC_Mode_ExternalCipher::Decryption cbcDecryption(aesDecryption, aesIv);
...

Fourth, the Crypto++ wiki has a Benchmarks article with the following code. Its a new section and was not available when you asked your question. Here's how to run your test.

AutoSeededRandomPool prng;
SecByteBlock key(16);
prng.GenerateBlock(key, key.size());

CTR_Mode<AES>::Encryption cipher;
cipher.SetKeyWithIV(key, key.size(), key);

const int BUF_SIZE = RoundUpToMultipleOf(2048U,
    dynamic_cast<StreamTransformation&>(cipher).OptimalBlockSize());

AlignedSecByteBlock buf(BUF_SIZE);
prng.GenerateBlock(buf, buf.size());

const double runTimeInSeconds = 3.0;
const double cpuFreq = 2.7 * 1000 * 1000 * 1000;
double elapsedTimeInSeconds;
unsigned long i=0, blocks=1;

ThreadUserTimer timer;
timer.StartTimer();

do
{
    blocks *= 2;
    for (; i<blocks; i++)
        cipher.ProcessString(buf, BUF_SIZE);
    elapsedTimeInSeconds = timer.ElapsedTimeAsDouble();
}
while (elapsedTimeInSeconds < runTimeInSeconds);

const double bytes = static_cast<double>(BUF_SIZE) * blocks;
const double ghz = cpuFreq / 1000 / 1000 / 1000;
const double mbs = bytes / 1024 / 1024 / elapsedTimeInSeconds;
const double cpb = elapsedTimeInSeconds * cpuFreq / bytes;

std::cout << cipher.AlgorithmName() << " benchmarks..." << std::endl;
std::cout << "  " << ghz << " GHz cpu frequency"  << std::endl;
std::cout << "  " << cpb << " cycles per byte (cpb)" << std::endl;
std::cout << "  " << mbs << " MiB per second (MiB)" << std::endl;

Running the code on a Core-i5 6400 at 2.7 GHz results in:

$ ./bench.exe
AES/CTR benchmarks...
  2.7 GHz cpu frequency
  0.58228 cycles per byte (cpb)
  4422.13 MiB per second (MiB)

Fifth, when I modify the benchmark program shown above to operate of 64-byte blocks:

const int BUF_SIZE = 64;
unsigned int blocks = 0;
...

do
{
    blocks++;
    cipher.ProcessString(buf, BUF_SIZE);
    elapsedTimeInSeconds = timer.ElapsedTimeAsDouble();
}
while (elapsedTimeInSeconds < runTimeInSeconds);

I see 3.4 cpb or 760 MB/s for the Core-i5 6400 at 2.7 GHz for 64-byte blocks. The library takes a hit for small buffers, but most (all?) libraries do.

$ ./bench.exe
AES/CTR benchmarks...
  2.7 GHz cpu frequency
  3.39823 cycles per byte (cpb)
  757.723 MiB per second (MiB)

Sixth, you need to get the processor out of powersave mode or a low energy state for best/most consistent results. The library uses governor.sh to do it on Linux. It is available in the TestScript/ directory.

Seventh, when I switch to CTR mode decryption with:

CTR_Mode<AES>::Decryption cipher;
cipher.SetKeyWithIV(key, key.size(), key);

Then I see about the same rate for bulk decryption:

$ ./bench.exe
AES/CTR benchmarks...
  2.7 GHz cpu frequency
  0.579923 cycles per byte (cpb)
  4440.11 MiB per second (MiB)

Eigth, here is a collection of benchmark numbers on a bunch of different machines. It should provide a rough target as you are tuning your tests.

My Beaglebone dev-board running at 980 MHz is moving data at twice the rate you are reporting. The Beaglebone achieves a boring 40 cpb with 20 MB/s because it is staright C/C++; and not optimized for A-32.

  • Skylake Core-i5 @2.7 GHz

    • https://www.cryptopp.com/benchmarks.html
  • Core2 Duo @ 2.0 GHz

    • https://www.cryptopp.com/benchmarks-core2.html
  • LeMaker HiKey Kirik SoC Aarch64 @1.2 GHz

    • https://www.cryptopp.com/benchmarks-hikey.html
  • AMD Opteron Aarch64 @2.0 GHz

    • https://www.cryptopp.com/benchmarks-opteron.html
  • BananaPi Cortex-A7 dev-board @860MHz

    • https://www.cryptopp.com/benchmarks-bananapi.html
  • IBM Power8 Server @4.1 GHz

    • https://www.cryptopp.com/benchmarks-power8.html

My takeaways are:

  • CTR mode bulk encryption and decryption are about the same on modern machines
  • CTR mode key setup are not the same, decryption takes a little longer on modern machines
  • Small block sizes are more expensive than large block sizes

It is everything I expect to see.

I think your next step is to collect some data using the sample program on the Crypto++ wiki and then evaluate the results.



回答3:

Theoretically, AES decryption is 30% slower. This is a property of Rijndael systems in general.

Source: http://www4.ncsu.edu/~hartwig/Teaching/437/aes.pdf