I have a structure template that takes two types (T
and S
), and at some point uses a static_cast
to convert from one type to the other. It is often the case that T
and S
are the same type.
A simplified example of the setup:
template <typename T, typename S = T>
struct foo
{
void bar(T val)
{
/* ... */
some_other_function(static_cast<S>(val));
/* ... */
}
};
In the case that S
is the same class as T
, does or can the static_cast
introduce extra overhead, or is it a null operation which will always be ignored?
If it does introduce overhead, is there a simple template metaprogramming trick to perform the static_cast
only if needed, or will I need to create a partial specialization to cope with the T == S
case? I'd rather avoid the partial specialization of the entire foo
template if possible.
Yes, it can.
Here is an example:
struct A {
A( A const& ) {
std::cout << "expensive copy\n";
}
};
template<typename T>
void noop( T const& ) {}
template <typename T, typename S = T>
void bar(T val)
{
noop(static_cast<S>(val));
}
template <typename T>
void bar2(T val)
{
noop(val);
}
int main() {
std::cout << "start\n";
A a;
std::cout << "bar2\n";
bar2(a); // one expensive copy
std::cout << "bar\n";
bar(a); // two expensive copies
std::cout << "done";
}
basically, a static_cast
can induce a copy constructor to be called.
For some types (like int
), a copy constructor is basically free, and the compiler can eliminate it.
For other types, it cannot. In this context, copy elision isn't legal either: if your copy constructor has side effects or the compiler cannot prove that it has no side effects (common if the copy constructor is non-trivial), it will be called.
To complement Yakk's answer, I've decided to post some assembly to confirm this. I've used std::string
as the test type.
foo<std::string>.bar()
- No casting
pushq %rbp
movq %rsp, %rbp
subq $32, %rsp
movq %rcx, 16(%rbp)
movq %rdx, 24(%rbp)
movq 24(%rbp), %rax
movq %rax, %rcx
call _Z19some_other_functionRKSs
nop
addq $32, %rsp
popq %rbp
ret
foo<std::string>.bar()
- static_cast<T>()
pushq %rbp
pushq %rbx
subq $56, %rsp
leaq 128(%rsp), %rbp
movq %rcx, -48(%rbp)
movq %rdx, -40(%rbp)
movq -40(%rbp), %rdx
leaq -96(%rbp), %rax
movq %rax, %rcx
call _ZNSsC1ERKSs // std::string.string()
leaq -96(%rbp), %rax
movq %rax, %rcx
call _Z19some_other_functionRKSs
leaq -96(%rbp), %rax
movq %rax, %rcx
call _ZNSsD1Ev // std::string.~string()
jmp .L12
movq %rax, %rbx
leaq -96(%rbp), %rax
movq %rax, %rcx
call _ZNSsD1Ev // std::string.~string()
movq %rbx, %rax
movq %rax, %rcx
call _Unwind_Resume
nop
.L12:
addq $56, %rsp
popq %rbx
popq %rbp
ret
This code is only generated with -O0
. Any optimization level whatsoever will even out the two cases.