I have an export function that read the entire database and create a .xls file with all the records. Then the file is sent to the client.
Of course, the time of export the full database requires a lot of time and the request will soon end in a timeout error.
What is the best solution to handle this case?
I heard something about making a queue with Redis for example but this will require two requests: one for starting the job that will generate the file and the second to download the generated file.
Is this possible with a single request from the client?
Excel Export:
Use Streams. Following is a rough idea of what might be done:
Use exceljs module. Because it has a streaming API aimed towards this exact problem.
var Excel = require('exceljs')
Since we are trying to initiate a download. Write appropriate headers to response.
res.status(200);
res.setHeader('Content-disposition', 'attachment; filename=db_dump.xls');
res.setHeader('Content-type', 'application/vnd.ms-excel');
Create a workbook backed by Streaming Excel writer. The stream given to writer is server response.
var options = {
stream: res, // write to server response
useStyles: false,
useSharedStrings: false
};
var workbook = new Excel.stream.xlsx.WorkbookWriter(options);
Now, the output streaming flow is all set up. for the input streaming, prefer a DB driver that gives query results/cursor as a stream.
Define an async function that dumps 1 table to 1 worksheet.
var tableToSheet = function (name, done) {
var str = dbDriver.query('SELECT * FROM ' + name).stream();
var sheet = workbook.addWorksheet(name);
str.on('data', function (d) {
sheet.addRow(d).commit(); // format object if required
});
str.on('end', function () {
sheet.commit();
done();
});
str.on('error', function (err) {
done(err);
});
}
Now, lets export some db tables, using async module's mapSeries:
async.mapSeries(['cars','planes','trucks'],tableToSheet,function(err){
if(err){
// log error
}
res.end();
})
CSV Export:
For CSV export of a single table/collection module fast-csv can be used:
// response headers as usual
res.status(200);
res.setHeader('Content-disposition', 'attachment; filename=mytable_dump.csv');
res.setHeader('Content-type', 'text/csv');
// create csv stream
var csv = require('fast-csv');
var csvStr = csv.createWriteStream({headers: true});
// open database stream
var dbStr = dbDriver.query('SELECT * from mytable').stream();
// connect the streams
dbStr.pipe(csvStr).pipe(res);
You are now streaming data from DB to HTTP response, converting it into xls/csv format on the fly. No need to buffer or store the entire data in memory or in a file.
You do not have to send the whole file once, you can send this file by chunks (line by line for example), just use res.write(chunk)
and res.end()
at finish to mark it as completed.
You can either send the file information as a stream, sending each individual chunk as it gets created via res.write(chunk)
, or, if sending the file chunk by chunk is not an option, and you have to wait for the entire file before sending any information, you can always keep the connection open by setting the timeout duration to Infinity or any value you think will be high enough to allow the file to be created. Then set up a function that creates the .xls file and either:
1) Accepts a callback that receives the data output as an argument once ready, sends that data, and then closes the connection, or;
2) Returns a promise that resolves with the data output once its ready, allowing you to send the resolved value and close the connection just like with the callback version.
It would look something like this:
function xlsRouteHandler(req, res){
res.setTimeout(Infinity) || res.socket.setTimeout(Infinity)
//callback version
createXLSFile(...fileCreationArguments, function(finishedFile){
res.end(finishedFile)
})
//promise version
createXLSFile(...fileCreationArguments)
.then(finishedFile => res.end(finishedFile))
}
If you still find yourself concerned about timing out, you can always set an interval timer to dispatch an occasional res.write()
message to prevent a timeout on the server connection and then cancel that interval once the final file content is ready to be sent.
Refer to this link which uses jedis (redis java client)
The key to this is the LPOPRPUSH command
https://blog.logentries.com/2016/05/queuing-tasks-with-redis/