JavaScript Performance Long Running Tasks

2019-01-09 06:14发布

问题:

I noticed a question on here the other day ( Reducing Javascript CPU Usage ) and I was intrigued.

Essentially the guy wanted to encrypt some files character by character. Obviously doing all this in one go is going to lock up the browser.

His first idea was to do it in chunks roughly 1kb's worth of string at a time, then pause for X ms so it would allow the user to keep interacting with the page between processing. He also considered using webWorkers ( the best idea ), but it obviously isn't cross browser.

Now I don't really want to go into why this probably isn't a good idea in javascript. But I wanted to see if I could come up with a solution.

I remembered watching a video by Douglas Crockford at js conf. The video was related to node.js and the event loop. But I remembered him talking about breaking long running functions down into individual chunks, so the newly called function goes to the end of the event loop. Instead of clogging the event loop up with a long running task, preventing anything else from happening.

I knew this was a solution worthy of my investigation. As a front-end Developer I have never really experienced extremely long running tasks in JS and was keen to find out about how to break them up and how they perform.

I decided to try a recursive function out, which calls itself from inside a setTimeout of 0ms. I figured that this would provide the breaks in the event loop for anything else that wanted to happen while it was running. But I also figured that while there is nothing else going on you will get maximum computation.

Here is what I came up with.

(I'm going to apologise for the code. I was experimenting in the console so this was quick and dirty.)

function test(i, ar, callback, start){
    if ( ar === undefined ){
        var ar = [],
        start = new Date;
    };
    if ( ar.length < i ){
        ar.push( i - ( i - ar.length )  );
        setTimeout(function(){
            test( i, ar, callback, start);
        },0);
    }
    else {
        callback(ar, start);
    };
}

( You can paste this code into the console and it will work )

Essentially what the function does is takes a number, creates an array and calls itself while the array.length < number pushing the count so far into the array. It passes the array created in the first call to all subsequent calls.

I tested it out and it seems to work exactly as intended. Only it's performance is fairly poor. I tested it out with..

( again this is not sexy code )

test(5000, undefined, function(ar, start ){ 
    var finish = new Date; 
    console.log(
        ar.length,
        'timeTaken: ', finish - start 
    ); 
});

Now I obviously wanted to know how long it took to complete, the above code took around 20s. Now it seems to me that it should not take 20s for JS to count to 5000. Add in the fact that it is doing some calculation and processing to push items into the array. But still 20s is a bit steep.

So I decided to spawn several at the same time to see how that effected the browser performance and calculation speeds.

( the code isn't getting any sexier )

function foo(){ 
test(5000, undefined, function(ar, start ){ var finish = new Date; console.log(ar.length, 'timeTaken: ', finish - start, 'issue: 1'  ) });
test(5000, undefined, function(ar, start ){ var finish = new Date; console.log(ar.length, 'timeTaken: ', finish - start, 'issue: 2'  ) });
test(5000, undefined, function(ar, start ){ var finish = new Date; console.log(ar.length, 'timeTaken: ', finish - start, 'issue: 3'  ) });
test(5000, undefined, function(ar, start ){ var finish = new Date; console.log(ar.length, 'timeTaken: ', finish - start, 'issue: 4'  ) });
test(5000, undefined, function(ar, start ){ var finish = new Date; console.log(ar.length, 'timeTaken: ', finish - start, 'issue: 5'  ) });
};

So that's five in total, running at the same time and not causing any hanging of the browser.

after the process's ended the all results returned at virtually exactly the same time. it took around 21.5s for all of them to complete. That's just 1.5s slower than one on it's own. But I was moving my mouse around the window on elements that had :hover effects just to make sure that the browser was still responding, so that might account for some of the 1.5s overhead.

So as these functions are obviously running in parallel there is more computational juice left in the browser.

Is anyone able to explain what's going on here performance wise, and give details on how to improve functions like this?

Just to go crazy I did this..

function foo(){
    var count = 100000000000000000000000000000000000000;  
    test(count, undefined, function(ar, start ){ var finish = new Date; console.log(ar.length, 'timeTaken: ', finish - start, 'issue: 1'  ) });
    test(count, undefined, function(ar, start ){ var finish = new Date; console.log(ar.length, 'timeTaken: ', finish - start, 'issue: 2'  ) });
    test(count, undefined, function(ar, start ){ var finish = new Date; console.log(ar.length, 'timeTaken: ', finish - start, 'issue: 3'  ) });
    test(count, undefined, function(ar, start ){ var finish = new Date; console.log(ar.length, 'timeTaken: ', finish - start, 'issue: 4'  ) });
    test(count, undefined, function(ar, start ){ var finish = new Date; console.log(ar.length, 'timeTaken: ', finish - start, 'issue: 5'  ) });
};

It's been running the whole time I have been writing this post, and is still going for it. The browser is not complaining or hanging. I will add the completion time once it ends.

回答1:

setTimeout does not have a minimal delay of 0ms. The minimal delay is anywhere in the range of 5ms-20ms dependent on browsers.

My own personal testing shows that setTimeout doesn't place your back on the event stack immediately

Live Example

It has an arbitary minimal time delay before it gets called again

var s = new Date(),
    count = 10000,
    cb = after(count, function() {
        console.log(new Date() - s);    
    });

doo(count, function() {
    test(10, undefined, cb);
});
  • Running 10000 of these in parallel counting to 10 takes 500ms.
  • Running 100 counting to 10 takes 60ms.
  • Running 1 counting to 10 takes 40ms.
  • Running 1 counting to 100 takes 400ms.

Cleary it seems that each individual setTimeout has to wait at least 4ms to be called again. But that's the bottle neck. The individual delay on setTimeout.

If you schedule a 100 or more of these in parallel then it will just work.

How do we optimise this?

var s = new Date(),
    count = 100,
    cb = after(count, function() {
        console.log(new Date() - s);    
    }),
    array = [];

doo(count, function() {
    test(10, array, cb);
});

Set up 100 running in parallel on the same array. This will avoid the main bottleneck which is the setTimeout delay.

The above completes in 2ms.

var s = new Date(),
    count = 1000,
    cb = after(count, function() {
        console.log(new Date() - s);    
    }),
    array = [];

doo(count, function() {
    test(1000, array, cb);
});

Completes in 7 milliseconds

var s = new Date(),
    count = 1000,
    cb = after(1, function() {
        console.log(new Date() - s);    
    }),
    array = [];

doo(count, function() {
    test(1000000, array, cb);
});

Running a 1000 jobs in parallel is roughly optimum. But you will start hitting bottlenecks. Counting to 1 million still takes 4500ms.



回答2:

Your issue is a matter of overhead vs unit of work. Your setTimeout overhead is very high while your unit of work ar.push is very low. The solution is an old optimization technique known as Block Processing. Rather than processing one UoW per call you need to process a block of UoW's. How large the "block" is depends on how much time each UoW takes and the maximum amount of time you can spend in each setTimeout/call/iteration (before the UI becomes unresponsive).

function test(i, ar, callback, start){
if ( ar === undefined ){
    var ar = [],
    start = new Date;
};
if ( ar.length < i ){
    // **** process a block **** //
    for(var x=0; x<50 && ar.length<i; x++){
        ar.push( i - ( i - ar.length )  );
    }
    setTimeout(function(){
        test( i, ar, callback, start);
    },0);
}
else {
    callback(ar, start);
};
}

You have to process the largest block you can without causing UI/performance issues for the user. The preceding runs ~50x faster (the size of the block).

It's the same reason we use a buffer for reading a file rather than reading it one byte at a time.



回答3:

Just an hypothesis... could it be that the code is so slow because you are building a recursion stack with 5000 recursion instances? your call is not truly recursive, since it happens through the settimeout function, but the function you pass in to it is a closure, so it will have to store all of the closure contexts...

The performance problem could be related to the cost of managing the memory, and this could explain also while your last test seems to make things worse...

I have not tried anything out with the interpreter, but it could be interesting to see if the computation time is linear with the number of recursions, or not... say: 100, 500, 1000, 5000 recursions...

first thing I would try as a workaround is not using a closure:

setTimeout(test, 0, i, ar, callback, start);


回答4:

BE actually talked about this, what you're using is recursive functions, and JavaScript right now doesn't have "Tail End Recursive Calls", which means that the interpreter / engine has to keep the stack frame for EVERY call, which gets heavy.

In order to optimize a solution, I would try making it into a immediate executing function, that's called in the global scope.