Why does asm.js deteriorate performance?

2019-03-08 17:45发布

问题:

Just to see how it performs, I wrote a very short asm.js module by hand, which simulates the 2D wave equation using 32-bit integer math and typed arrays (Int32Array). I have three versions of it, all as similar as possible:

  1. Ordinary (i.e. legible, albeit C-style) JavaScript
  2. Same as 1, with asm.js annotations added so that it passes the validator, according to Firefox and other tools
  3. Same as 2, except with no "use asm"; directive at the top

I left a demo at http://jsfiddle.net/jtiscione/xj0x0qk3/ which lets you switch between modules to see the effects of using each one. All three work, but at different speeds. This is the hotspot (with asm.js annotations):

for (i = 0; ~~i < ~~h; i = (1 + i)|0) {
    for (j = 0; ~~j < ~~w; j = (1 + j)|0) {
        if (~~i == 0) {
            index = (1 + index) | 0;
            continue;
        }
        if (~~(i + 1) == ~~h) {
            index = (1 + index) | 0;
            continue;
        }
        if (~~j == 0) {
            index = (1 + index) | 0;
            continue;
        }
        if (~~(j + 1) == ~~w) {
            index = (1 + index) | 0;
            continue;
        }
        uCen = signedHeap  [((u0_offset + index) << 2) >> 2] | 0;
        uNorth = signedHeap[((u0_offset + index - w) << 2) >> 2] | 0;
        uSouth = signedHeap[((u0_offset + index + w) << 2) >> 2] | 0;
        uWest = signedHeap [((u0_offset + index - 1) << 2) >> 2] | 0;
        uEast = signedHeap [((u0_offset + index + 1) << 2) >> 2] | 0;
        uxx = (((uWest + uEast) >> 1) - uCen) | 0;
        uyy = (((uNorth + uSouth) >> 1) - uCen) | 0;
        vel = signedHeap[((vel_offset + index) << 2) >> 2] | 0;
        vel = vel + (uxx >> 1) | 0;
        vel = applyCap(vel) | 0;
        vel = vel + (uyy >> 1) | 0;
        vel = applyCap(vel) | 0;
        force = signedHeap[((force_offset + index) << 2) >> 2] | 0;
        signedHeap[((u1_offset + index) << 2) >> 2] = applyCap(((applyCap((uCen + vel) | 0) | 0) + force) | 0) | 0;
        force = force - (force >> forceDampingBitShift) | 0;
        signedHeap[((force_offset + index) << 2) >> 2] = force;
        vel = vel - (vel >> velocityDampingBitShift) | 0;
        signedHeap[((vel_offset + index) << 2) >> 2] = vel;
        index = (index + 1)|0;
    }
}

The "ordinary JavaScript" version is structured as above, but without the bitwise operators that asm.js requires (e.g. "x|0", "~~x", "arr[(x<<2)>>2]", etc.)

These are the results for all three modules on my machine, using Firefox (Developer Edition v. 41) and Chrome (version 44), in milliseconds per iteration:

  • FIREFOX (version 41): 20 ms, 35 ms, 60 ms.
  • CHROME (version 44): 25 ms, 150 ms, 75 ms.

So ordinary JavaScript wins in both browsers. The presence of asm.js-required annotations deteriorates performance by a factor of 3 in both. Furthermore, the presence of the "use asm"; directive has an obvious effect- it helps Firefox a bit, and brings Chrome to its knees!

It seems strange that merely adding bitwise operators should introduce a threefold performance degradation that can't be overcome by telling the browser to use asm.js. Also, why does telling the browser to use asm.js only help marginally in Firefox, and completely backfire in Chrome?

回答1:

Actually asm.js has not been created to write code by hand but only as result of a compilation from other languages. As far as I know there are no tools that validate the asm.js code. Have you tried to write the code in C lang and use Emscripten to generate the asm.js code? I strongly suspect that the result would be quite different and optimized for asm.js.

I think that mixing typed and untyped vars you only add complexity without any benefits. On the contrary the "asm.js" code is more complex: I tried to parse the asm.js and the plain functions on jointjs.com/demos/javascript-ast and the results are:

  • the plain js function has 137 nodes and 746 tokens
  • the asm.js function has 235 nodes and 1252 tokens

I would say that if you have more instructions to execute in each loop it easily will be slower.



回答2:

There is some fix cost to switching asm.js contexts. Ideally you do it once and run all of your code within your app as asm.js. Then you can control memory management using typed arrays and avoid lots of garbage collections. I'd suggest to rewrite the profiler and measure asm.js within asm.js - without context switching.