I'm having trouble understanding how Node operates regarding it's parallel processing and returning values from function calls.
FYI: The gulp function below is merely created as an example for this question.
Is it possible that the function could return the stream before the Read a large file
statement has finished processing (the large file has been fully read from the file system and the stream has been added), or is Node smart enough to complete all statements before returning?
function moveFiles(){
var gulp = require('gulp'),
stream = require('merge-stream')();
// Read a large file
stream.add(gulp.src('src/large-file.txt')
.pipe(gulp.dest('dest/'))
);
// Read a small file
stream.add(gulp.src('src/small-file.txt')
.pipe(gulp.dest('dest/'))
);
return (stream.isEmpty() ? null : stream);
}
Could Node feasibly return a value from a function call before completing all operations within the function itself?
This is a tricky question. The answer is no, in a way that returning a value means that the function is finished executing, it's taken back from the stack and it will never do anything again - unless it's invoked another time of course, but the point is that this particular invocation is over.
But the tricky part is that it's the function that's finished executing and it doesn't mean that it couldn't schedule something else to happen in the future. It will get more complicated in a minute but first a very simple example.
function x() {
setTimeout(function () {
console.log('x1'));
}, 2000);
console.log('x2');
return;
console.log('x3');
}
Here when you call x()
then it will schedule another function to run after 2 seconds, then it will print x2
and then it will return - at which point this function cannot do anything else ever again for that invocation.
It means that x3
will never get printed, but x1
will eventually get printed - because it's another function that will be called when the timeout fires. The anonymous function will get called not because the x()
function can do anything after it returns, but because it managed to schedule the timeout before it returned.
Now, instead of just scheduling things to happen in the future, a function can return a promise that will get resolved some time later. For example:
function y() {
console.log('y1');
return new Promise(function (resolve, reject) {
setTimeout(function () {
resolve('message from y()');
}, 2000);
});
console.log('y2');
}
Now, when you run:
var promise = y();
what will happen is that y1
will get printed, a new promise will get returned and y2
will never get printed because at that point y()
returned and cannot do anything else. But it managed to schedule a timeout that will resolve the promise after two seconds.
You can observe it with:
promise.then(function (value) {
console.log(value);
});
So with this example you can see that while the y()
function itself returned and cannot do anything else, some other (anonymous in this case) function can be called in the future and finish the job that the y()
function has initiated.
So I hope now it's clear why it's a tricky question. In a way a function cannot do anything after returning. But it could have scheduled some other functions as timeouts, event handlers etc. that can do something after the functions returns. And if the thing that the function returns is a promise then the caller can easily observe the value in the future when it's ready.
All of the examples could be simplified by using the arrow functions but I wanted to make it explicit that those are all separate functions, some of them are named, some are anonymous.
For more details see some of those answers:
- A detailed explanation on how to use callbacks and promises
- Explanation on how to use promises in complex request handlers
- An explanation of what a promise really is, on the example of AJAX requests
- An explanation of callbacks, promises and how to access data returned asynchronously