I have a template base class with a get_p_pow
method that is called by a foo
function:
template <typename T_container>
class base {
public:
int foo() {
...
get_p_pow(p_pow, delta_p);
...
}
...
protected:
virtual T_container& get_p_pow(T_container &p_pow, double delta_p) const {
p_pow(0) = 1.0;
p_pow(1) = delta_p;
for (difference_type i = 2; i <= order; ++i) {
p_pow(i) *= p_pow(i-1)*delta_p;
}
return p_pow;
}
int order;
};
For some derived classes, the value of order
is set to a specific number, so I can unroll the loop, with the hope that foo
calls and inlines the unrolled version:
template <typename T_container>
class child : public base<T_container> {
...
protected:
T_container& get_p_pow(T_container &p_pow, double delta_p) const {
p_pow(0) = 1.0;
p_pow(1) = delta_p;
p_pow(2) = p_pow(1)*delta_p;
p_pow(3) = p_pow(2)*delta_p;
p_pow(4) = p_pow(3)*delta_p;
p_pow(5) = p_pow(4)*delta_p;
return p_pow;
}
// order set to 5 in constructor
};
The problem is, is that I know for virtual functions, most of the time they cannot be inlined, unless the compiler has the specific instance of the object, and not a pointer/reference to it. However, since base
and child
are template functions, they are located in a header file which is included with every translation unit that uses these classes. That means the compiler should know everything it needs in order to support inlining (to my knowledge, since it does not need separate compilation). I've tried this out, and basically the function isn't inlined, and it doesn't lead to any real performance benefit (in addition to function call overhead, I think pipelining gets ruined too). Is there a way to support inlining for this situation? Or is there any advice to implement this kind of thing?