I'm a node.js newbie and I'm trying to understand how I can organize some logic in the non-blocking way node likes it.
I have a set of environments ['stage','prod'], and another set of parameters called brands ['A','B','C'] and a set of devices ['phone','tablet'].
In node's callback-driven world I have this:
brands.forEach( function(brand) {
devices.forEach( function(device) {
var tapeS = getTape('stage',brand,device); // bad example...tapeS never set
var tapeP = getTape('prod' ,brand,device);
})
} )
// more stuff here
function getTape(env,brand,device) {
var req = http.request(someOptions,function(resp) {
// ok, so we handle the response here, but how do I sequence this with all the other
// responses, also happening asynchronously?
});
}
I'm trying to build a report with blocks for each environment:
A:
Stage -- report
Prod -- report
B: ...
My problem is that since everything here is so async, especially inside getTape, which calls node's http.request. How can I serialize everything at the end of all this async wonderment so I can create the report in the order I want?
I heard something about javascript Promises. Would that help, i.e. some way to collect all these Promises then wait for them all to complete, then get the data they collected?
Q is the dominant promise implementation in node.js. I also have my own super light weight promises library Promise. My library doesn't implement all the features I've used in these examples, but it could be made to work with minor adaptation. The underpinning specification for how promises work and ineroperate is Promises/A+. It defines the behavior for a .then
method and is pretty readable, so definitely give it a look at some point (not necessarily straight away).
The idea behind promises is that they encapsulate an asynchronous value. This makes it easier to reason about how to convert synchronous code into asynchronous code because there are usually nice parallels. As an introduction to these concepts I would recommend my talk on Promises and Generators or one of Domenic Denicola's talks (such as Promises, Promises or Callbacks, Promises, and Coroutines (oh my!)).
The first thing to decide is whether you want to make your requests in parallel, or one at a time sequenctially. From the question I'm going to guess that you want to do them in parallel. I'm also going to assume you're using Q which means you'll have to install it with:
npm install q
and require it at the top of each file in which you use it:
var Q = require('q');
Thinking about the ideal data structure to be using to print out that report, I think you'd have an array of brands, with an array of devices which would be objects with properties stage
and prod
, something like:
[
{
brand: 'A',
devices: [
{
device: 'phone',
stage: TAPE,
prod: TAPE
},
{
device: 'tablet',
stage: TAPE,
prod: TAPE
}
...
]
},
{
brand: 'B',
devices: [
{
device: 'phone',
stage: TAPE,
prod: TAPE
},
{
device: 'tablet',
stage: TAPE,
prod: TAPE
}
...
]
}
...
]
I'm going to assume that if you had that then you would have no trouble printing out the desired report.
Promised HTTP Request
Lets start by looking at the getTape
function. Are you expecting it to return a node.js stream or a buffer/string containing the entire downloaded file? Either way, you're going to find it a lot easier with the help of a library. If you're new to node.js I'd recommend request as a library that just does what you'd expect. If you're feeling more confident, substack's hyperquest is a much smaller library and arguably neater, but it requires you to handle things like redirects manually, which you probably don't want to get in to.
Streaming (difficult)
The streaming approach is tricky. It can be done and will be needed if your tapes are 100s of MB long, but promises are then probably not the right way to go. I'm happy to look into this in more detail if it's an issue you actually have.
Buffering with request (easy)
To create a function that does a buffering HTTP request using request and returns a promise, it's fairly simple.
var Q = require('q')
var request = Q.denodeify(require('request'))
Q.denodeify
is just a shortcut for saying: "take this function that normally expects a callback and give me a function that takes a promise".
To write getTape
based off of that we do something like:
function getTape(env, brand, device) {
var response = request({
uri: 'http://example.com/' + env + '/' + brand + '/' + device,
method: 'GET'
})
return response.then(function (res) {
if (res.statusCode >= 300) {
throw new Error('Server responded with status code ' + res.statusCode)
} else {
return res.body.toString() //assuming tapes are strings and not binary data
}
})
}
What's happening there is that request
(via Q.denodeify
) is returning a promise. We're calling .then(onFulfilled, onRejected)
on that promise. This returns a new transformed promise. If the response promise was rejected (equivalent to throw
in synchronous code) then so is the transformed promise (because we didn't attach an onRejected
handler).
If you throw in one of the handlers, the transformed promise is rejected. If you return a value from one of the handlers then the transformed promise is "fulfilled" (also sometimes referred to as "resolved") with that value. We can then chain more .then
calls on the end of our transformed promise.
We return the transformed promise as the result of our function.
Making the requests
JavaScript has a really helpful function called .map
. It's like .forEach
but returns a transformed array. I'm going to use that to stay as close as possible to the original synchronous code.
var data = brands.map(function (brand) {
var b = {brand: brand}
b.devices = devices.map(function (device) {
var d = {device: device}
d.tapeS = getTape('stage',brand,device); // bad example...tapeS never set
d.tapeP = getTape('prod' ,brand,device);
return d
})
})
Now we have code that gives us the data structure I proposed at the start, except we have Promise<TAPE>
instead of TAPE
.
Waiting for the requests
Q has a really helpful method called Q.all
. It takes an array of promises and waits for them all to complete, so lets turn our data structure into an array of promises to pass to Q.all.
One way to do this is at the end, we can go through each item and wait for the promises to resolve.
var updated = Q.all(data.map(function (brand) {
return Q.all(brand.devices.map(function (device) {
return Q.all([device.tapeS, device.tapeP])
.spread(function (tapeS, tapeP) {
//update the values with the returned promises
device.tapeS = tapeS
device.tapeP = tapeP
})
})
}))
//if you add a line that reads `updated = updated.thenResolve(data)`,
//updated would become a promise for the data structure (after being resolved)
updated.then(function () {
// `data` structure now has no promises in it and is ready to be printed
})
Another aproach would be to do it as we go, so that the "making the requests" code gets replaced with:
var data = Q.all(brands.map(function (brand) {
var b = {brand: brand}
Q.all(devices.map(function (device) {
var d = {device: device}
var tapeSPromise = getTape('stage',brand,device);
var tapePPromise = getTape('prod' ,brand,device);
return Q.all([tapeSPromise, tapePPromise])
.spread(function (tapeS, tapeP) { //now these are the actual tapes
d.tapeS = tapeS
d.tapeP = tapeP
return d
})
}))
.then(function (devices) {
b.devices = devices
return b
})
}))
data.then(function (data) {
// `data` structure now has no promises in it and is ready to be printed
})
Still another approach would be to use a small utility library that does a recursive deep-resolve of an object. I haven't got round to publishing it, but this utility function (borrowed from work by Kriskowal) does a deep resolve, which would let you use:
var data = deep(brands.map(function (brand) {
var b = {brand: brand}
b.devices = devices.map(function (device) {
var d = {device: device}
d.tapeS = getTape('stage',brand,device); // bad example...tapeS never set
d.tapeP = getTape('prod' ,brand,device);
return d
})
}))
data.then(function (data) {
// `data` structure now has no promises in it and is ready to be printed
})
To get a promise for the final data.
I'm also rather new to node.js, and I recently discovered a few libraries that are especially effective at organizing asynchronous callbacks in a variety of ways. However, by far my favorite is async by caolan. It has a few useful patterns, but the ones that I have found most useful are async.series, async.parallel, async.waterfall. The first one, async.series, just executes asynchronous functions in linear order:
async.series([
function(callback){
// do some stuff ...
callback(null, 'one');
},
function(callback){
// do some more stuff ...
callback(null, 'two');
}
],
// optional callback
function(err, results){
// results is now equal to ['one', 'two']
});
The second, async.parallel, simply executes functions simultaneously:
async.parallel([
function(callback){
setTimeout(function(){
callback(null, 'one');
}, 200);
},
function(callback){
setTimeout(function(){
callback(null, 'two');
}, 100);
}
],
// optional callback
function(err, results){
// the results array will equal ['one','two'] even though
// the second function had a shorter timeout.
});
The last one, which is also my favorite, is like the previously mentioned async.series, but it also passes the results of the previous function to the next one:
async.waterfall([
function(callback){
callback(null, 'one', 'two');
},
function(arg1, arg2, callback){
callback(null, 'three');
},
function(arg1, callback){
// arg1 now equals 'three'
callback(null, 'done');
}
], function (err, result) {
// result now equals 'done'
});
Well, that's my piece. This is just the simplest way to format node's crazy non-blocking architecture in my opinion. If you need any more help, send me a PM. I know how daunting node.js can become with bigger, more complex codebases.
Cheers.
If you are interested in using promises, you could take a look at my Faithful library. It mimics the Async API for a lot of functions, and also features a "collect" function which you mentioned briefly.
Note that, as of now, faithful.parallel only accept an array, not a hash. That's still to be implemented.
An alternative option to promises would be to use the async
module:
async.map(brands, function(brand, brand_cb) {
async.map(brand.devices, function(device, device_cb) {
async.parallel({
stage: function(cb) {
// ...
cb(null, stage_data)
},
prod: function(cb) {
// ...
cb(null, prod_data)
}
}, function(err, data) {
device_cb(null, {name: device, data: data});
});
}, function(err, data) {
brand_cb(null, {name: brand, devices: data});
});
}, function(err, all_the_results) {
console.log(all_the_results[0].devices[0].data.prod;
});
As a beginner, you might want to stay with callbacks and simple flow control libraries for now. Look into promises after you have a good grasp of callbacks and the continuation-passing style.
Here is a simple approach using the queue library, for example:
var queue = require('queue-async')
var q = queue()
brands.forEach(function(brand){
brand.devices.forEach(function(device){
q.defer(getTape.bind(null, 'stage', brand, device))
q.defer(getTape.bind(null, 'prod', brand, device))
})
})
q.awaitAll(function(error, results){
// use result pairs here
console.log(results)
})