Can you cache a virtual function lookup in C++?

2019-03-09 00:26发布

Say I have a virtual function call foo() on an abstract base class pointer, mypointer->foo(). When my app starts up, based on the contents of a file, it chooses to instantiate a particular concrete class and assigns mypointer to that instance. For the rest of the app's life, mypointer will always point to objects of that concrete type. I have no way to know what this concrete type is (it may be instantiated by a factory in a dynamically loaded library). I only know that the type will stay the same after the first time an instance of the concrete type is made. The pointer may not always point to the same object, but the object will always be of the same concrete type. Notice that the type is technically determined at 'runtime' because it's based on the contents of a file, but that after 'startup' (file is loaded) the type is fixed.

However, in C++ I pay the virtual function lookup cost every time foo is called for the entire duration of the app. The compiler can't optimize the look up away because there's no way for it to know that the concrete type won't vary at runtime (even if it was the most amazing compiler ever, it can't speculate on the behavior of dynamically loaded libraries). In a JIT compiled language like Java or .NET the JIT can detect that the same type is being used over and over and do inline cacheing. I'm basically looking for a way to manually do that for specific pointers in C++.

Is there any way in C++ to cache this lookup? I realize that solutions might be pretty hackish. I'm willing to accept ABI/compiler specific hacks if it's possible to write configure tests that discover the relevant aspects of the ABI/compiler so that it's "practically portable" even if not truly portable.

Update: To the naysayers: If this wasn't worth optimizing, then I doubt modern JITs would do it. Do you think Sun and MS's engineers were wasting their time implementing inline cacheing, and didn't benchmark it to ensure there was an improvement?

9条回答
Root(大扎)
2楼-- · 2019-03-09 01:03

So assuming that this is a fundamental issue you want to solve (to avoid premature optimization arguments), and ignoring platform and compiler specific hackery, you can do one of two things, at opposite ends of complexity:

  1. Provide a function as part of the .dll that internally simply calls the right member function directly. You pay the cost of an indirect jump, but at least you don't pay the cost of a vtable lookup. Your mileage may vary, but on certain platforms, you can optimize the indirect function call.
  2. Restructure your application such that instead of calling a member function per instance, you call a single function that takes a collection of instances. Mike Acton has a wonderful post (with a particular platform and application type bent) on why and how you should do this.
查看更多
贪生不怕死
3楼-- · 2019-03-09 01:07

I asked a very similar question recently, and got the answer that it's possible as a GCC extension, but not portably:

C++: Pointer to monomorphic version of virtual member function?

In particular, I also tried it with Clang and it doesn't support this extension (even though it supports many other GCC extensions).

查看更多
【Aperson】
4楼-- · 2019-03-09 01:07

So, what you basically want to do is convert runtime polymorphism into compile time polymorphism. Now you still need to build your app so that it can handle multiple "cases", but once it's decided which case is applicable to a run, that's it for the duration.

Here's a model of the runtime polymorphism case:

struct Base {
  virtual void doit(int&)=0;
};

struct Foo : public Base {
  virtual void doit(int& n) {--n;}
};

struct Bar : public Base {
  virtual void doit(int& n) {++n;}
};

void work(Base* it,int& n) {
  for (unsigned int i=0;i<4000000000u;i++) it->doit(n);
}

int main(int argc,char**) {
  int n=0;

  if (argc>1)
    work(new Foo,n);
  else
    work(new Bar,n);

  return n;
}

This takes ~14s to execute on my Core2, compiled with gcc 4.3.2 (32 bit Debian), -O3 option.

Now suppose we replace the "work" version with a templated version (templated on the concrete type it's going to be working on):

template <typename T> void work(T* it,int& n) {
  for (unsigned int i=0;i<4000000000u;i++) it->T::doit(n);
}

main doesn't actually need to be updated, but note that the 2 calls to work now trigger instantiations of and calls to two different and type-specific functions (c.f the one polymorphic function previously).

Hey presto runs in 0.001s. Not a bad speed up factor for a 2 line change! However, note that the massive speed up is entirely due to the compiler, once the possibility of runtime polymorphism in the work function is eliminated, just optimizing away the loop and compiling the result directly into the code. But that actually makes an important point: in my experience the main gains from using this sort of trick come from the opportunities for improved inlining and optimisation they allow the compiler when a less-polymorphic, more specific function is generated, not from the mere removal of vtable indirection (which really is very cheap).

But I really don't recommend doing stuff like this unless profiling absolutely indicates runtime polymorphism is really hitting your performance. It'll also bite you as soon as someone subclasses Foo or Bar and tries to pass that into a function actually intended for its base.

You might find this related question interesting too.

查看更多
登录 后发表回答