Can static_cast to same type introduce runtime ove

2019-06-15 16:59发布

问题:

I have a structure template that takes two types (T and S), and at some point uses a static_cast to convert from one type to the other. It is often the case that T and S are the same type.

A simplified example of the setup:

template <typename T, typename S = T>
struct foo
{
  void bar(T val)
  {
    /* ... */
    some_other_function(static_cast<S>(val));
    /* ... */
  }
};

In the case that S is the same class as T, does or can the static_cast introduce extra overhead, or is it a null operation which will always be ignored?

If it does introduce overhead, is there a simple template metaprogramming trick to perform the static_cast only if needed, or will I need to create a partial specialization to cope with the T == S case? I'd rather avoid the partial specialization of the entire foo template if possible.

回答1:

Yes, it can.

Here is an example:

struct A {
  A( A const& ) {
    std::cout << "expensive copy\n";
  }
};

template<typename T>
void noop( T const& ) {}
template <typename T, typename S = T>
void bar(T val)
{
  noop(static_cast<S>(val));
}
template <typename T>
void bar2(T val)
{
  noop(val);
}
int main() {
  std::cout << "start\n";
  A a;
  std::cout << "bar2\n";
  bar2(a); // one expensive copy
  std::cout << "bar\n";
  bar(a); // two expensive copies
  std::cout << "done";
}

basically, a static_cast can induce a copy constructor to be called.

For some types (like int), a copy constructor is basically free, and the compiler can eliminate it.

For other types, it cannot. In this context, copy elision isn't legal either: if your copy constructor has side effects or the compiler cannot prove that it has no side effects (common if the copy constructor is non-trivial), it will be called.



回答2:

To complement Yakk's answer, I've decided to post some assembly to confirm this. I've used std::string as the test type.

foo<std::string>.bar() - No casting

pushq   %rbp
movq    %rsp, %rbp
subq    $32, %rsp
movq    %rcx, 16(%rbp)
movq    %rdx, 24(%rbp)
movq    24(%rbp), %rax
movq    %rax, %rcx
call    _Z19some_other_functionRKSs
nop
addq    $32, %rsp
popq    %rbp
ret

foo<std::string>.bar() - static_cast<T>()

pushq   %rbp
pushq   %rbx
subq    $56, %rsp
leaq    128(%rsp), %rbp
movq    %rcx, -48(%rbp)
movq    %rdx, -40(%rbp)
movq    -40(%rbp), %rdx
leaq    -96(%rbp), %rax
movq    %rax, %rcx
call    _ZNSsC1ERKSs     // std::string.string()
leaq    -96(%rbp), %rax
movq    %rax, %rcx
call    _Z19some_other_functionRKSs
leaq    -96(%rbp), %rax
movq    %rax, %rcx
call    _ZNSsD1Ev    // std::string.~string()
jmp .L12
movq    %rax, %rbx
leaq    -96(%rbp), %rax
movq    %rax, %rcx
call    _ZNSsD1Ev    // std::string.~string()
movq    %rbx, %rax
movq    %rax, %rcx
call    _Unwind_Resume
nop
.L12:
addq    $56, %rsp
popq    %rbx
popq    %rbp
ret


This code is only generated with -O0. Any optimization level whatsoever will even out the two cases.