asm.js - How should function pointers be implement

2019-02-18 10:21发布

问题:

Note: This question is purely about asm.js not about C++ nor any other programming language.

As the title already says:

How should a function pointer be implemented in a efficient way?

I couldn't find anything on the web, so I figured asking it here.

Edit: I would like to implement virtual functions in the compiler I'm working on.

In C++ I would do something like this to generate a vtable:

#include <iostream>

class Base {
  public:
    virtual void doSomething() = 0;
};

class Derived : public Base {
  public:
    void doSomething() {
        std::cout << "I'm doing something..." << std::endl;
    }
};

int main()
{
    Base* instance = new Derived();
    instance->doSomething();
    return 0;
}

To be more precise; how can I generate a vtable in asm.js without the need of plain JavaScript? In any case, I would like the "near native" capabilities of asm.js while using function pointers.

The solution may be suitable for computer generated code only.

回答1:

Looking over how asm.js works, I believe your best bet would be to use the method the original CFront compiler used: compile the virtual methods down to functions that take a this pointer, and use thunks to correct the this pointer before passing it. I'll go through it step by step:

No Inheritance

Reduce methods to special functions:

void ExampleObject::foo( void );

would be transformed into

void exampleobject_foo( ExampleObject* this );

This works fine for non-inheritance based objects.

Single Inheritance

We can easily add support for arbitrary large amount of single inheritance through a simple trick: always store the object in memory base first:

class A : public B

would become, in memory:

[[ B ] A ]

Getting closer!

Multiple Inheritance Now, multiple inheritance makes this much harder to work with

class A : public B, public C

It's impossible for both B and C to be at the start of A; they simply cannot co-exist. There are two choices:

  1. Store an explicit offset (known as delta) for each call to base.
  2. Do not allow calls through A to B or C

The second choice is much preferable for a variety of reasons; if you are calling a base class member function, it's rare you would want to do it through a derived class. Instead, you could simply go C::conlyfunc, which could then do the adjustment to your pointer for you at no cost. Allowing A::conlyfunc removes important information that the compiler could have used, at very little benefit.

The first choice is used in C++; all multiple inheritance objects call a thunk before each call to a base class, which adjusts the this pointer to point to the subobject inside it. In a simple example:

class ExampleBaseClass
{
    void foo( void );
}

class ExampleDerivedClass : public ExampleBaseClass, private IrrelevantBaseClass
{
    void bar( void );
}

would then become

void examplebaseclass_foo( ExampleBaseClass* this );
void examplederivedclass_bar( ExampleDerivedClass* this);

void examplederivedclass_thunk_foo( ExampleDerivedClass* this)
{
    examplebaseclass_foo( this + delta );
}

This could be inlined in many situations, so it's not too big of overhead. However, if you could never refer to ExampleBaseClass::foo as ExampleDerivedClass::foo, these thunks wouldn't be needed, as the delta would be easily discernible from the call itself.

Virtual functions

Virtual functions adds a whole new layer of complexity. In the multiple inheritance example, the thunks had fixed addresses to call; we were just adjusting the this before passing it to an already known function. With virtual functions, the function we're calling is unknown; we could be overridden by a derived object we have no possibility of knowing about at compile time, due to it being in another translation unit or a library, etc.

This means we need some form of dynamic dispatch for each object that has a virtually overridable function; many methods are possible, but C++ implementations tend to use a simple array of function pointers, or a vtable. To each object that has virtual functions, we add a point to an array as a hidden member, usually at the front:

class A
{
hidden:
    void* vtable;
public:
    virtual void foo( void );
}

We add thunk functions which redirect to the vtable

void a_foo( A* this )
{
    int vindex = 0;
    this->vtable[vindex](this);
}

The vtable is then populated with pointers to the functions we actually want to call:

vtable[0] = &A::foo_default; // our baseclass implimentation of foo

In a derived class, if we wish to override this virtual function, all we need to do is change the vtable in our own object, to point to the new function, and it will override in the base class as well:

class B: public A
{
    virtual void foo( void );
}

will then do this in the constructor:

((A*)this)->vtable[0] = &B::foo;

Finally, we have support for all forms of inheritance! Almost.

Virtual Inheritance There is one final caveat with this implementation: if you continue to allow Derived::foo to be used when what you really mean is Base::foo, you get the diamond problem:

class A : public B, public C;
class B : public D;
class C : public D;

A::DFunc(); // Which D?

This problem can also occur when you use base classes as stateful classes, or when you put function that should be has-a rather than is-a; generally, it's a sign of a need for a restructure. But not always.

In C++, this has a solution that is not very elegant, but works:

class A : public B, public C;
class B : virtual D;
class C : virtual D;

This requires those who implement such classes and hierarchies to think ahead and intentionally make their classes a little slower, to support a possible future usage. But it does solve the problem.

How can we implement this solution?

[ [ D ] [ B ] [ Dptr ] [ C ] [ Dptr ] A ]

Rather than use the base class directly, as in normal inheritance, with virtual inheritance we push all usages of D through a pointer, adding an indirection, while stomping the multiple instantiations into a single one. Notice that both B and C have their own pointer, that both point to the same D; this is because B and C don't know if they are free floating copies or bound in derived objects. The same calls need to be used for both, or virtual functions won't work as expected.

Summary

  1. Transform method calls into function calls with a special this parameter in base classes
  2. Structure objects in memory so single inheritance is no different from no inheritance
  3. Add thunks to adjust this pointers then call base classes for multiple inheritance
  4. Add vtables to classes with virtual methods, and make all calls to methods go through vtable to method (thunk -> vtable -> method)
  5. Deal with virtual inheritance through a pointer-to-baseobject rather than derive object calls

All of this is straightforward in js.asm.



回答2:

Tim,

I'm by no means an asm.js expert but your question intrigues me. It goes to the hart of object oriented language design. It also seems ironic that you are recreating machine-level problems in the javascript domain.

The solution to your question it seems to me is that you will need to setup an accounting of defined types and functions. In Java this is typically done by decorating bytecode with identifiers that represent the correct class->function mapping of any given object. If you use a Int32 identifier for each class that is defined and an additionaly Int32 identifier for each function defined you can then store these in the object representations on the heap. Your vtable is then no more than the mapping of these combination to specific functions.

I hope this helps you.



回答3:

I am not very familiar with the exact syntax of asm.js, but here is how I implemented a vtable in a compiler of mine (for x86):

Each object are derived from a struct like this:

struct Object {
    VTable *vtable;
};

Then the other types I use will look something like this in c++-syntax:

struct MyInt : Vtable {
    int value;
};

which is (in this case) equivalent to:

struct MyInt {
    VTable *vtable;
    int value;
};

So the final layout of the objects are that at offset 0, I know I have a pointer to a vtable. The vtable I use is simply an array of function pointers, again in C-syntax the type VTable could be defined as follows:

typedef Function *VTable;

Where in C i would use void * instead of Function *, since the actual types of the function will vary. What is left for the compiler to do is:

1: For each type containing virtual functions, create a global vtable and populate it with function pointers to the overridden functions.

2: When an object is created, set the vtable member of the object (at offset 0) to point to the global vtable.

Then when you want to call virtual functions you can do something like this:

(*myObject->vtable[1])(1);

to call the function your compiler has assigned the ID 1 in the vtable (methodB in the example below).

A final example: Let's say we have the following two classes:

class A {
public:
    virtual int methodA(int) { ... }
    virtual int methodB(int) { ... }
    virtual int methodC(int) { ... }
};

class B : public A {
public:
    virtual int methodA(int) { ... }
    virtual int methodB(int) { ... }
};

The VTable for class A and B can look like this:

A:                   B:
0: &A::methodA       0: &B::methodA
1: &A::methodB       1: &B::methodB
2: &A::methodC       2: &A::methodC

By using this logic, we know that when we are calling methodB on any type derived from A, we shall call whatever function is located at index 1 in the vtable of that object.

Of course, this solution do not work right away if you want to allow for multiple inheritance, but I am fairly sure it can be extended to do so. After some debugging with Visual Studio 2008, it seems like this is more or less how the vtables are implemented there (of course there it is extended to handle multiple inheritance, I have not tried to figure that out yet).

I hope you get some ideas that can be applied in asm.js at least. Like I said, I do not know exactly how asm.js works, but I have managed to implement this system in x86 assembly and I do not see any issues with implementing it in JavaScript either, so I hope it can be used in asm.js as well.