C++ so far (unfortunately) doesn't support finally
clause for a try
statement. This leads to speculations on how to release resources. After studying the question on the internet, although I found some solutions, I didn't get clear about their performance (and I would use Java if performance didn't matter that much). So I had to benchmark.
The options are:
Functor-based
finally
class proposed at CodeProject. It's powerful, but slow. And the disassembly suggests that outer function local variables are captured very inefficiently: pushed to the stack one by one, rather than passing just the frame pointer to the inner (lambda) function.RAII: Manual cleaner object on the stack: the disadvantage is manual typing and tailoring it for each place used. Another disadvantage is the need to copy to it all the variables needed for resource release.
MSVC++ specific
__try
/__finally
statement. The disadvantage is that it's obviously not portable.
I created this small benchmark to compare the runtime performance of these approaches:
#include <chrono>
#include <functional>
#include <cstdio>
class Finally1 {
std::function<void(void)> _functor;
public:
Finally1(const std::function<void(void)> &functor) : _functor(functor) {}
~Finally1() {
_functor();
}
};
void BenchmarkFunctor() {
volatile int64_t var = 0;
const int64_t nIterations = 234567890;
auto start = std::chrono::high_resolution_clock::now();
for (int64_t i = 0; i < nIterations; i++) {
Finally1 doFinally([&] {
var++;
});
}
auto elapsed = std::chrono::high_resolution_clock::now() - start;
double nSec = 1e-6 * std::chrono::duration_cast<std::chrono::microseconds>(elapsed).count();
printf("Functor: %.3lf Ops/sec, var=%lld\n", nIterations / nSec, (long long)var);
}
void BenchmarkObject() {
volatile int64_t var = 0;
const int64_t nIterations = 234567890;
auto start = std::chrono::high_resolution_clock::now();
for (int64_t i = 0; i < nIterations; i++) {
class Cleaner {
volatile int64_t* _pVar;
public:
Cleaner(volatile int64_t& var) : _pVar(&var) { }
~Cleaner() { (*_pVar)++; }
} c(var);
}
auto elapsed = std::chrono::high_resolution_clock::now() - start;
double nSec = 1e-6 * std::chrono::duration_cast<std::chrono::microseconds>(elapsed).count();
printf("Object: %.3lf Ops/sec, var=%lld\n", nIterations / nSec, (long long)var);
}
void BenchmarkMSVCpp() {
volatile int64_t var = 0;
const int64_t nIterations = 234567890;
auto start = std::chrono::high_resolution_clock::now();
for (int64_t i = 0; i < nIterations; i++) {
__try {
}
__finally {
var++;
}
}
auto elapsed = std::chrono::high_resolution_clock::now() - start;
double nSec = 1e-6 * std::chrono::duration_cast<std::chrono::microseconds>(elapsed).count();
printf("__finally: %.3lf Ops/sec, var=%lld\n", nIterations / nSec, (long long)var);
}
template <typename Func> class Finally4 {
Func f;
public:
Finally4(Func&& func) : f(std::forward<Func>(func)) {}
~Finally4() { f(); }
};
template <typename F> Finally4<F> MakeFinally4(F&& f) {
return Finally4<F>(std::forward<F>(f));
}
void BenchmarkTemplate() {
volatile int64_t var = 0;
const int64_t nIterations = 234567890;
auto start = std::chrono::high_resolution_clock::now();
for (int64_t i = 0; i < nIterations; i++) {
auto doFinally = MakeFinally4([&] { var++; });
//Finally4 doFinally{ [&] { var++; } };
}
auto elapsed = std::chrono::high_resolution_clock::now() - start;
double nSec = 1e-6 * std::chrono::duration_cast<std::chrono::microseconds>(elapsed).count();
printf("Template: %.3lf Ops/sec, var=%lld\n", nIterations / nSec, (long long)var);
}
void BenchmarkEmpty() {
volatile int64_t var = 0;
const int64_t nIterations = 234567890;
auto start = std::chrono::high_resolution_clock::now();
for (int64_t i = 0; i < nIterations; i++) {
var++;
}
auto elapsed = std::chrono::high_resolution_clock::now() - start;
double nSec = 1e-6 * std::chrono::duration_cast<std::chrono::microseconds>(elapsed).count();
printf("Empty: %.3lf Ops/sec, var=%lld\n", nIterations / nSec, (long long)var);
}
int __cdecl main() {
BenchmarkFunctor();
BenchmarkObject();
BenchmarkMSVCpp();
BenchmarkTemplate();
BenchmarkEmpty();
return 0;
}
The results on my Ryzen 1800X @3.9Ghz with DDR4 @2.6Ghz CL13 were:
Functor: 175148825.946 Ops/sec, var=234567890
Object: 553446751.181 Ops/sec, var=234567890
__finally: 553832236.221 Ops/sec, var=234567890
Template: 554964345.876 Ops/sec, var=234567890
Empty: 554468478.903 Ops/sec, var=234567890
Apparently, all the options except functor-base (#1) are as fast as an empty loop.
So is there a fast and powerful C++ alternative to finally
, which is portable and requires minimum copying from the stack of the outer function?
UPDATE: I've benchmarked @Jarod42 solution, so here in the question is updated code and output. Though as mentioned by @Sopel, it may break if copy elision is not performed.
UPDATE2: To clarify what I'm asking for is a convenient fast way in C++ to execute a block of code even if an exception is thrown. For the reasons mentioned in the question, some ways are slow or inconvenient.
You can implement
Finally
without type erasure and overhead ofstd::function
:And use it like:
Demo
Well, it's your benchmark that's broken: It does not actually throw, so you only see the non-exception path. This is quite bad as the optimizer can prove that you don't throw, so it can throw away all code that actually handles performing cleanup with an exception in flight.
I think, you should repeat your benchmark, putting a call to
exceptionThrower()
ornonthrowingThrower()
into yourtry{}
block. These two functions should be compiled as a separate translation unit, and only linked together with the benchmark code. That will force the compiler to actually generate exception handling code irrespective of whether you callexceptionThrower()
ornonthrowingThrower()
. (Make sure that you don't switch on link time optimizations, that could spoil the effect.)This will also allow you to easily compare the performance impacts between the exception and the non-throwing execution paths.
Apart from the benchmark issues, exceptions in C++ are slow. You'll never get hundreds of millions of exceptions thrown within a second. It's more around single digit millions at best, likely less. I expect that any performance differences between different
finally
implementations are entirely irrelevant in the throwing case. What you can optimize is the non-throwing path, where your cost is simply the construction/destruction of yourfinally
implementation object, whatever that is.