Say I have a virtual function call foo() on an abstract base class pointer, mypointer->foo(). When my app starts up, based on the contents of a file, it chooses to instantiate a particular concrete class and assigns mypointer to that instance. For the rest of the app's life, mypointer will always point to objects of that concrete type. I have no way to know what this concrete type is (it may be instantiated by a factory in a dynamically loaded library). I only know that the type will stay the same after the first time an instance of the concrete type is made. The pointer may not always point to the same object, but the object will always be of the same concrete type. Notice that the type is technically determined at 'runtime' because it's based on the contents of a file, but that after 'startup' (file is loaded) the type is fixed.
However, in C++ I pay the virtual function lookup cost every time foo is called for the entire duration of the app. The compiler can't optimize the look up away because there's no way for it to know that the concrete type won't vary at runtime (even if it was the most amazing compiler ever, it can't speculate on the behavior of dynamically loaded libraries). In a JIT compiled language like Java or .NET the JIT can detect that the same type is being used over and over and do inline cacheing. I'm basically looking for a way to manually do that for specific pointers in C++.
Is there any way in C++ to cache this lookup? I realize that solutions might be pretty hackish. I'm willing to accept ABI/compiler specific hacks if it's possible to write configure tests that discover the relevant aspects of the ABI/compiler so that it's "practically portable" even if not truly portable.
Update: To the naysayers: If this wasn't worth optimizing, then I doubt modern JITs would do it. Do you think Sun and MS's engineers were wasting their time implementing inline cacheing, and didn't benchmark it to ensure there was an improvement?
So assuming that this is a fundamental issue you want to solve (to avoid premature optimization arguments), and ignoring platform and compiler specific hackery, you can do one of two things, at opposite ends of complexity:
I asked a very similar question recently, and got the answer that it's possible as a GCC extension, but not portably:
C++: Pointer to monomorphic version of virtual member function?
In particular, I also tried it with Clang and it doesn't support this extension (even though it supports many other GCC extensions).
So, what you basically want to do is convert runtime polymorphism into compile time polymorphism. Now you still need to build your app so that it can handle multiple "cases", but once it's decided which case is applicable to a run, that's it for the duration.
Here's a model of the runtime polymorphism case:
This takes ~14s to execute on my Core2, compiled with gcc 4.3.2 (32 bit Debian),
-O3
option.Now suppose we replace the "work" version with a templated version (templated on the concrete type it's going to be working on):
main
doesn't actually need to be updated, but note that the 2 calls towork
now trigger instantiations of and calls to two different and type-specific functions (c.f the one polymorphic function previously).Hey presto runs in 0.001s. Not a bad speed up factor for a 2 line change! However, note that the massive speed up is entirely due to the compiler, once the possibility of runtime polymorphism in the
work
function is eliminated, just optimizing away the loop and compiling the result directly into the code. But that actually makes an important point: in my experience the main gains from using this sort of trick come from the opportunities for improved inlining and optimisation they allow the compiler when a less-polymorphic, more specific function is generated, not from the mere removal of vtable indirection (which really is very cheap).But I really don't recommend doing stuff like this unless profiling absolutely indicates runtime polymorphism is really hitting your performance. It'll also bite you as soon as someone subclasses
Foo
orBar
and tries to pass that into a function actually intended for its base.You might find this related question interesting too.