NodeJS - file upload with progress bar using Core

2019-02-05 04:16发布

问题:

Ryan Dahl has said he invented NodeJS to solve the file upload progress bar problem (https://youtu.be/SAc0vQCC6UQ). Using technology available in 2009 when Node was introduced, so before Express and more advanced client-side javascript libraries that automagically tell you progress updates, how exactly did NodeJS solve this problem?

Trying to use just Core NodeJS now, I understand with the request stream I can look at the header, get the total file size, and then get the size of each chunk of data as it comes through, to tell me the percent complete. But then I don't understand how to stream those progress updates back to the browser, since the browser doesn't seem to update until request.end().

Once again I want to wrap my ahead around how NodeJS originally solved this progress update problem. WebSockets weren't around yet, so you couldn't just open a WebSocket connection to the client and stream the progress updates back to the browser. Was there another client-side javascript technology that was used?

Here is my attempt so far. Progress updates are streamed to the server-side console, but the browser only updates once the response stream receives response.end().

var http = require('http');
var fs = require('fs');

var server = http.createServer(function(request, response){
    response.writeHead(200);
    if(request.method === 'GET'){
        fs.createReadStream('filechooser.html').pipe(response);     
    }
    else if(request.method === 'POST'){
        var outputFile = fs.createWriteStream('output');
        var total = request.headers['content-length'];
        var progress = 0;

        request.on('data', function(chunk){
            progress += chunk.length;
            var perc = parseInt((progress/total)*100);
            console.log('percent complete: '+perc+'%\n');
            response.write('percent complete: '+perc+'%\n');
        });

        request.pipe(outputFile);

        request.on('end', function(){
            response.end('\nArchived File\n\n');
        });
    }

});

server.listen(8080, function(){
    console.log('Server is listening on 8080');
});

filechooser.html:

<!DOCTYPE html>
<html>
<body>
<form id="uploadForm" enctype="multipart/form-data" action="/" method="post">
    <input type="file" id="upload" name="upload" />
    <input type="submit" value="Submit">
</form>
</body>
</html>

Here is an Updated attempt. The browser now displays progress updates, but I'm pretty sure this isn't the actual solution Ryan Dahl originally came up with for a production scenario. Did he use long polling? What would that solution look like?

var http = require('http');
var fs = require('fs');

var server = http.createServer(function(request, response){
    response.setHeader('Content-Type', 'text/html; charset=UTF-8');
    response.writeHead(200);

    if(request.method === 'GET'){
        fs.createReadStream('filechooser.html').pipe(response);     
    }
    else if(request.method === 'POST'){
        var outputFile = fs.createWriteStream('UPLOADED_FILE');
        var total = request.headers['content-length'];
        var progress = 0;

        response.write('STARTING UPLOAD');
        console.log('\nSTARTING UPLOAD\n');

        request.on('data', function(chunk){
            fakeNetworkLatency(function() {
                outputFile.write(chunk);
                progress += chunk.length;
                var perc = parseInt((progress/total)*100);
                console.log('percent complete: '+perc+'%\n');
                response.write('<p>percent complete: '+perc+'%');
            });
        });

        request.on('end', function(){
            fakeNetworkLatency(function() {
                outputFile.end();
                response.end('<p>FILE UPLOADED!');
                console.log('FILE UPLOADED\n');
            });
        });
    }

});

server.listen(8080, function(){
    console.log('Server is listening on 8080');
});

var delay = 100; //delay of 100 ms per chunk
var count =0;
var fakeNetworkLatency = function(callback){
    setTimeout(function() {
        callback();
    }, delay*count++);
};

回答1:

Firstly, your code is indeed working; node sends chunked responses, but the browser is simply waiting for more before bothering to show it.

More info in Node Documentation:

The first time response.write() is called, it will send the buffered header information and the first body to the client. The second time response.write() is called, Node assumes you're going to be streaming data, and sends that separately. That is, the response is buffered up to the first chunk of body.

If you set content-type to html like response.setHeader('Content-Type', 'text/html; charset=UTF-8');, it makes chrome render the content, but that only did the trick when I used a series of set timeout calls with response.write calls inside; it still didn't update the dom when I tried with your code, so I dug deeper...

The trouble is that it's really up to the browser to render content when it sees fit, so I set up code to send ajax requests to check status instead:

Firstly, I updated the server to simply store its status in a global variable and open a "checkstatus" endpoint to read it:

var http = require('http');
var fs = require('fs');
var status = 0;

var server = http.createServer(function (request, response) {
    response.writeHead(200);
    if (request.method === 'GET') {
        if (request.url === '/checkstatus') {
            response.end(status.toString());
            return;
        }
        fs.createReadStream('filechooser.html').pipe(response);
    }
    else if (request.method === 'POST') {
        status = 0;
        var outputFile = fs.createWriteStream('output');
        var total = request.headers['content-length'];
        var progress = 0;

        request.on('data', function (chunk) {
            progress += chunk.length;
            var perc = parseInt((progress / total) * 100);
            console.log('percent complete: ' + perc + '%\n');
            status = perc;
        });

        request.pipe(outputFile);

        request.on('end', function () {
            response.end('\nArchived File\n\n');
        });
    }

});

server.listen(8080, function () {
    console.log('Server is listening on 8080');
});

Then, I updated the filechooser.html to check the status with ajax requests:

<!DOCTYPE html>
<html>
<body>
<form id="uploadForm" enctype="multipart/form-data" action="/" method="post">
    <input type="file" id="upload" name="upload"/>
    <input type="submit" value="Submit">
</form>

Percent Complete: <span id="status">0</span>%

</body>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script>
<script>
    var $status = $('#status');
    /**
     * When the form is submitted, begin checking status periodically.
     * Note that this is NOT long-polling--that's when the server waits to respond until something changed. 
     * In a prod env, I recommend using a websockets library with a long-polling fall-back for older broswers--socket.io is a gentleman's choice)
     */
    $('form').on('submit', function() {
        var longPoll = setInterval(function () {
            $.get('/checkstatus').then(function (status) {
                $status.text(status);

                //when it's done, stop annoying the server
                if (parseInt(status) === 100) {
                    clearInterval(longPoll);
                }
            });
        }, 500);
    });
</script>
</html>

Note that despite me not ending the response, the server is still able to handle incoming status requests.

So to answer your question, Dahl was facinated by a flickr app he saw that uploaded a file and long-polled to check it's status. The reason he was facinated was that the server was able to handle those ajax requests while it continued to work on the upload. It was multi-tasking. See him talk about it exactly 14 minutes into this video--even says, "So here's how it works...". A few minutes later, he mentions an iframe technique and also differentiates long-polling from simple ajax requests. He states that he wanted to write a server that was optimized for these types of behavior.

Anyway, this was un-common in those days. Most web server software would only handle one request at a time. And if they went to a database, called out to a webservice, interacted with the filesystem, or anything like that, the process would just sit and wait for it to finish instead of handling other requests while it waited.

If you wanted to handle multiple requests concurrently, you'd have to fire up another thread or add more servers with a load balancer.

Nodejs, on the other hand, makes very efficient use of the main process by doing non-blocking IO. Node wasn't the first to do this, but what sets it apart in the non-blocking IO realm is that all its default methods are asynchronous and you have to call a "sync" method to do the wrong thing. It kind of forces users to do the right thing.

Also, it should be noted, the reason javascript was chosen was because it is already a language that is running in an event-loop; it was made to handle asynchronous code. You can have anonymous functions and closures, which makes async actions much easier to maintain.

I also want to mention that using a promise library also makes writing async code much cleaner. For instance, check out bluebirdjs--it has a nice "promisify" method that will convert functions on an object's prototype that have the callback signature (function(error, params){}) to instead return a promise.