- Desired Behaviour
- Actual Behaviour
- What I've Tried
- Steps To Reproduce
- Research
Desired Behaviour
Pipe multiple readable streams, received from multiple api requests, to a single writeable stream.
The api responses are from ibm-watson's textToSpeech.synthesize() method.
The reason multiple requests are required is because the service has a 5KB
limit on text input.
Therefore a string of 18KB
, for example, requires four requests to complete.
Actual Behaviour
The writeable stream file is incomplete and garbled.
The application seems to 'hang'.
When I try and open the incomplete .mp3
file in an audio player, it says it is corrupted.
The process of opening and closing the file seems to increase its file size - like opening the file somehow prompts more data to flow in to it.
Undesirable behaviour is more apparent with larger inputs, eg four strings of 4000 bytes or less.
What I've Tried
I've tried several methods to pipe the readable streams to either a single writeable stream or multiple writeable streams using the npm packages combined-stream, combined-stream2, multistream and archiver and they all result in incomplete files. My last attempt doesn't use any packages and is shown in the Steps To Reproduce
section below.
I am therefore questioning each part of my application logic:
01. What is the response type of a watson text to speech api request?
The text to speech docs, say the api response type is:
Response type: NodeJS.ReadableStream|FileObject|Buffer
I am confused that the response type is one of three possible things.
In all my attempts, I have been assuming it is a readable stream
.
02. Can I make multiple api requests in a map function?
03. Can I wrap each request within a promise()
and resolve the response
?
04. Can I assign the resulting array to a promises
variable?
05. Can I declare var audio_files = await Promise.all(promises)
?
06. After this declaration, are all responses 'finished'?
07. How do I correctly pipe each response to a writable stream?
08. How do I detect when all pipes have finished, so I can send file back to client?
For questions 2 - 6, I am assuming the answer is 'YES'.
I think my failures relate to question 7 and 8.
Steps To Reproduce
You can test this code with an array of four randomly generated text strings with a respective byte size of 3975
, 3863
, 3974
and 3629
bytes - here is a pastebin of that array.
// route handler
app.route("/api/:api_version/tts")
.get(api_tts_get);
// route handler middleware
const api_tts_get = async (req, res) => {
var query_parameters = req.query;
var file_name = query_parameters.file_name;
var text_string_array = text_string_array; // eg: https://pastebin.com/raw/JkK8ehwV
var absolute_path = path.join(__dirname, "/src/temp_audio/", file_name);
var relative_path = path.join("./src/temp_audio/", file_name); // path relative to server root
// for each string in an array, send it to the watson api
var promises = text_string_array.map(text_string => {
return new Promise((resolve, reject) => {
// credentials
var textToSpeech = new TextToSpeechV1({
iam_apikey: iam_apikey,
url: tts_service_url
});
// params
var synthesizeParams = {
text: text_string,
accept: 'audio/mp3',
voice: 'en-US_AllisonV3Voice'
};
// make request
textToSpeech.synthesize(synthesizeParams, (err, audio) => {
if (err) {
console.log("synthesize - an error occurred: ");
return reject(err);
}
resolve(audio);
});
});
});
try {
// wait for all responses
var audio_files = await Promise.all(promises);
var audio_files_length = audio_files.length;
var write_stream = fs.createWriteStream(`${relative_path}.mp3`);
audio_files.forEach((audio, index) => {
// if this is the last value in the array,
// pipe it to write_stream,
// when finished, the readable stream will emit 'end'
// then the .end() method will be called on write_stream
// which will trigger the 'finished' event on the write_stream
if (index == audio_files_length - 1) {
audio.pipe(write_stream);
}
// if not the last value in the array,
// pipe to write_stream and leave open
else {
audio.pipe(write_stream, { end: false });
}
});
write_stream.on('finish', function() {
// download the file (using absolute_path)
res.download(`${absolute_path}.mp3`, (err) => {
if (err) {
console.log(err);
}
// delete the file (using relative_path)
fs.unlink(`${relative_path}.mp3`, (err) => {
if (err) {
console.log(err);
}
});
});
});
} catch (err) {
console.log("there was an error getting tts");
console.log(err);
}
}
The official example shows:
textToSpeech.synthesize(synthesizeParams)
.then(audio => {
audio.pipe(fs.createWriteStream('hello_world.mp3'));
})
.catch(err => {
console.log('error:', err);
});
which seems to work fine for single requests, but not for multiple requests, as far as I can tell.
Research
concerning readable and writeable streams, readable stream modes (flowing and paused), 'data', 'end', 'drain' and 'finish' events, pipe(), fs.createReadStream() and fs.createWriteStream()
Almost all Node.js applications, no matter how simple, use streams in some manner...
const server = http.createServer((req, res) => {
// `req` is an http.IncomingMessage, which is a Readable Stream
// `res` is an http.ServerResponse, which is a Writable Stream
let body = '';
// get the data as utf8 strings.
// if an encoding is not set, Buffer objects will be received.
req.setEncoding('utf8');
// readable streams emit 'data' events once a listener is added
req.on('data', (chunk) => {
body += chunk;
});
// the 'end' event indicates that the entire body has been received
req.on('end', () => {
try {
const data = JSON.parse(body);
// write back something interesting to the user:
res.write(typeof data);
res.end();
} catch (er) {
// uh oh! bad json!
res.statusCode = 400;
return res.end(`error: ${er.message}`);
}
});
});
https://nodejs.org/api/stream.html#stream_api_for_stream_consumers
Readable streams have two main modes that affect the way we can consume them...they can be either in the paused
mode or in the flowing
mode. All readable streams start in the paused mode by default but they can be easily switched to flowing
and back to paused
when needed...just adding a data
event handler switches a paused stream into flowing
mode and removing the data
event handler switches the stream back to paused
mode.
https://www.freecodecamp.org/news/node-js-streams-everything-you-need-to-know-c9141306be93
Here’s a list of the important events and functions that can be used with readable and writable streams
The most important events on a readable stream are:
The data
event, which is emitted whenever the stream passes a chunk of data to the consumer
The end
event, which is emitted when there is no more data to be consumed from the stream.
The most important events on a writable stream are:
The drain
event, which is a signal that the writable stream can receive more data.
The finish
event, which is emitted when all data has been flushed to the underlying system.
https://www.freecodecamp.org/news/node-js-streams-everything-you-need-to-know-c9141306be93
.pipe()
takes care of listening for 'data' and 'end' events from the fs.createReadStream()
.
https://github.com/substack/stream-handbook#why-you-should-use-streams
.pipe()
is just a function that takes a readable source stream src and hooks the output to a destination writable stream dst
https://github.com/substack/stream-handbook#pipe
The return value of the pipe()
method is the destination stream
https://flaviocopes.com/nodejs-streams/#pipe
By default, stream.end() is called on the destination Writable
stream when the source Readable
stream emits 'end'
, so that the destination is no longer writable. To disable this default behavior, the end
option can be passed as false
, causing the destination stream to remain open:
https://nodejs.org/api/stream.html#stream_readable_pipe_destination_options
The 'finish'
event is emitted after the stream.end()
method has been called, and all data has been flushed to the underlying system.
const writer = getWritableStreamSomehow();
for (let i = 0; i < 100; i++) {
writer.write(`hello, #${i}!\n`);
}
writer.end('This is the end\n');
writer.on('finish', () => {
console.log('All writes are now complete.');
});
https://nodejs.org/api/stream.html#stream_event_finish
If you're trying to read multiple files and pipe them to a writable stream, you have to pipe each one to the writable stream and and pass end: false
when doing it, because by default, a readable stream ends the writable stream when there's no more data to be read. Here's an example:
var ws = fs.createWriteStream('output.pdf');
fs.createReadStream('pdf-sample1.pdf').pipe(ws, { end: false });
fs.createReadStream('pdf-sample2.pdf').pipe(ws, { end: false });
fs.createReadStream('pdf-sample3.pdf').pipe(ws);
https://stackoverflow.com/a/30916248
You want to add the second read into an eventlistener for the first read to finish...
var a = fs.createReadStream('a');
var b = fs.createReadStream('b');
var c = fs.createWriteStream('c');
a.pipe(c, {end:false});
a.on('end', function() {
b.pipe(c)
}
https://stackoverflow.com/a/28033554
A Brief History of Node Streams - part one and two.
Related Google search:
how to pipe multiple readable streams to a single writable stream? nodejs
Questions covering the same or similar topic, without authoritative answers (or might be 'outdated'):
How to pipe multiple ReadableStreams to a single WriteStream?
Piping to same Writable stream twice via different Readable stream
Pipe multiple files to one response
Creating a Node.js stream from two piped streams
The core problem to solve here is asynchronicity. You almost had it: the problem with the code you posted is that you are piping all source streams in parallel & unordered into the target stream. This means data
chunks will flow randomly from different audio streams - even your end
event will outrace the pipe
s without end
closing the target stream too early, which might explain why it increases after you re-open it.
What you want is to pipe them sequentially - you even posted the solution when you quoted
You want to add the second read into an eventlistener for the first read to finish...
or as code:
a.pipe(c, { end:false });
a.on('end', function() {
b.pipe(c);
}
This will pipe the source streams in sequential order into the target stream.
Taking your code this would mean to replace the audio_files.forEach
loop with:
await Bluebird.mapSeries(audio_files, async (audio, index) => {
const isLastIndex = index == audio_files_length - 1;
audio.pipe(write_stream, { end: isLastIndex });
return new Promise(resolve => audio.on('end', resolve));
});
Note the usage of bluebird.js mapSeries here.
Further advice regarding your code:
- you should consider using lodash.js
- you should use
const
& let
instead of var
and consider using camelCase
- when you notice "it works with one event, but fails with multiple" always think: asynchronicity, permutations, race conditions.
Further reading, limitations of combining native node streams: https://github.com/nodejs/node/issues/93
I'll give my two cents here, since I looked at a similar question recently! From what I have tested, and researched, you can combine the two .mp3 / .wav streams into one. This results in a file that has noticable issues as you've mentioned such as truncation, glitches etc.
The only way I believe you can combine the Audio streams correctly will be with a module that is designed to concatenate sound files/data.
The best result I have obtained is to synthesize the audio into separate files, then combine like so:
function combineMp3Files(files, outputFile) {
const ffmpeg = require("fluent-ffmpeg");
const combiner = ffmpeg().on("error", err => {
console.error("An error occurred: " + err.message);
})
.on("end", () => {
console.log('Merge complete');
});
// Add in each .mp3 file.
files.forEach(file => {
combiner.input(file)
});
combiner.mergeToFile(outputFile);
}
This uses the node-fluent-ffmpeg library, which requires installing ffmpeg.
Other than that I'd suggest you ask IBM support (because as you say the docs don't seem to indicate this) how API callers should combine the synthesized audio, since your use case will be very common.
To create the text files, I do the following:
// Switching to audio/webm and the V3 voices.. much better output
function synthesizeText(text) {
const synthesizeParams = {
text: text,
accept: 'audio/webm',
voice: 'en-US_LisaV3Voice'
};
return textToSpeech.synthesize(synthesizeParams);
}
async function synthesizeTextChunksSeparateFiles(text_chunks) {
const audioArray = await Promise.all(text_chunks.map(synthesizeText));
console.log(`synthesizeTextChunks: Received ${audioArray.length} result(s), writing to separate files...`);
audioArray.forEach((audio, index) => {
audio.pipe(fs.createWriteStream(`audio-${index}.mp3`));
});
}
And then combine like so:
combineMp3Files(['audio-0.mp3', 'audio-1.mp3', 'audio-2.mp3', 'audio-3.mp3', 'audio-4.mp3'], 'combined.mp3');
I should point out that I'm doing this in two separate steps (waiting a few hundred milliseconds would also work), but it should be easy enough to wait for the individual files to be written, then combine them.
Here's a function that will do this:
async function synthesizeTextChunksThenCombine(text_chunks, outputFile) {
const audioArray = await Promise.all(text_chunks.map(synthesizeText));
console.log(`synthesizeTextChunks: Received ${audioArray.length} result(s), writing to separate files...`);
let writePromises = audioArray.map((audio, index) => {
return new Promise((resolve, reject) => {
audio.pipe(fs.createWriteStream(`audio-${index}.mp3`).on('close', () => {
resolve(`audio-${index}.mp3`);
}));
})
});
let files = await Promise.all(writePromises);
console.log('synthesizeTextChunksThenCombine: Separate files: ', files);
combineMp3Files(files, outputFile);
}
Here are two solutions.
Solution 01
- uses
Bluebird.mapSeries
- writes individual responses to temporary files
- puts them in a zip file (using archiver)
- sends zip file back to client to save
- deletes temporary files
It utilises Bluebird.mapSeries
from BM's answer but instead of just mapping over the responses, requests and responses are handled within the map function. Also, it resolves promises on the writeable stream finish
event, rather than the readable stream end
event. Bluebird
is helpful in that it pauses
iteration within a map function until a response has been received and handled, and then moves on to the next iteration.
Given that the Bluebird
map function produces clean audio files, rather than zipping the files, you could use a solution like in Terry Lennox's answer to combine multiple audio files into one audio file. My first attempt of that solution, using Bluebird
and fluent-ffmpeg
, produced a single file, but it was slightly lower quality - no doubt this could be tweaked in ffmpeg
settings, but i didn't have time to do that.
// route handler
app.route("/api/:api_version/tts")
.get(api_tts_get);
// route handler middleware
const api_tts_get = async (req, res) => {
var query_parameters = req.query;
var file_name = query_parameters.file_name;
var text_string_array = text_string_array; // eg: https://pastebin.com/raw/JkK8ehwV
var absolute_path = path.join(__dirname, "/src/temp_audio/", file_name);
var relative_path = path.join("./src/temp_audio/", file_name); // path relative to server root
// set up archiver
var archive = archiver('zip', {
zlib: { level: 9 } // sets the compression level
});
var zip_write_stream = fs.createWriteStream(`${relative_path}.zip`);
archive.pipe(zip_write_stream);
await Bluebird.mapSeries(text_chunk_array, async function(text_chunk, index) {
// check if last value of array
const isLastIndex = index === text_chunk_array.length - 1;
return new Promise((resolve, reject) => {
var textToSpeech = new TextToSpeechV1({
iam_apikey: iam_apikey,
url: tts_service_url
});
var synthesizeParams = {
text: text_chunk,
accept: 'audio/mp3',
voice: 'en-US_AllisonV3Voice'
};
textToSpeech.synthesize(synthesizeParams, (err, audio) => {
if (err) {
console.log("synthesize - an error occurred: ");
return reject(err);
}
// write individual files to disk
var file_name = `${relative_path}_${index}.mp3`;
var write_stream = fs.createWriteStream(`${file_name}`);
audio.pipe(write_stream);
// on finish event of individual file write
write_stream.on('finish', function() {
// add file to archive
archive.file(file_name, { name: `audio_${index}.mp3` });
// if not the last value of the array
if (isLastIndex === false) {
resolve();
}
// if the last value of the array
else if (isLastIndex === true) {
resolve();
// when zip file has finished writing,
// send it back to client, and delete temp files from server
zip_write_stream.on('close', function() {
// download the zip file (using absolute_path)
res.download(`${absolute_path}.zip`, (err) => {
if (err) {
console.log(err);
}
// delete each audio file (using relative_path)
for (let i = 0; i < text_chunk_array.length; i++) {
fs.unlink(`${relative_path}_${i}.mp3`, (err) => {
if (err) {
console.log(err);
}
console.log(`AUDIO FILE ${i} REMOVED!`);
});
}
// delete the zip file
fs.unlink(`${relative_path}.zip`, (err) => {
if (err) {
console.log(err);
}
console.log(`ZIP FILE REMOVED!`);
});
});
});
// from archiver readme examples
archive.on('warning', function(err) {
if (err.code === 'ENOENT') {
// log warning
} else {
// throw error
throw err;
}
});
// from archiver readme examples
archive.on('error', function(err) {
throw err;
});
// from archiver readme examples
archive.finalize();
}
});
});
});
});
}
Solution 02
I was keen to find a solution that didn't use a library to "pause" within the map()
iteration, so I:
- swapped the
map()
function for a for of loop
- used
await
before the api call, rather than wrapping it in a promise, and
- instead of using
return new Promise()
to contain the response handling, I used await new Promise()
(gleaned from this answer)
This last change, magically, paused the loop until the archive.file()
and audio.pipe(writestream)
operations were completed - i'd like to better understand how that works.
// route handler
app.route("/api/:api_version/tts")
.get(api_tts_get);
// route handler middleware
const api_tts_get = async (req, res) => {
var query_parameters = req.query;
var file_name = query_parameters.file_name;
var text_string_array = text_string_array; // eg: https://pastebin.com/raw/JkK8ehwV
var absolute_path = path.join(__dirname, "/src/temp_audio/", file_name);
var relative_path = path.join("./src/temp_audio/", file_name); // path relative to server root
// set up archiver
var archive = archiver('zip', {
zlib: { level: 9 } // sets the compression level
});
var zip_write_stream = fs.createWriteStream(`${relative_path}.zip`);
archive.pipe(zip_write_stream);
for (const [index, text_chunk] of text_chunk_array.entries()) {
// check if last value of array
const isLastIndex = index === text_chunk_array.length - 1;
var textToSpeech = new TextToSpeechV1({
iam_apikey: iam_apikey,
url: tts_service_url
});
var synthesizeParams = {
text: text_chunk,
accept: 'audio/mp3',
voice: 'en-US_AllisonV3Voice'
};
try {
var audio_readable_stream = await textToSpeech.synthesize(synthesizeParams);
await new Promise(function(resolve, reject) {
// write individual files to disk
var file_name = `${relative_path}_${index}.mp3`;
var write_stream = fs.createWriteStream(`${file_name}`);
audio_readable_stream.pipe(write_stream);
// on finish event of individual file write
write_stream.on('finish', function() {
// add file to archive
archive.file(file_name, { name: `audio_${index}.mp3` });
// if not the last value of the array
if (isLastIndex === false) {
resolve();
}
// if the last value of the array
else if (isLastIndex === true) {
resolve();
// when zip file has finished writing,
// send it back to client, and delete temp files from server
zip_write_stream.on('close', function() {
// download the zip file (using absolute_path)
res.download(`${absolute_path}.zip`, (err) => {
if (err) {
console.log(err);
}
// delete each audio file (using relative_path)
for (let i = 0; i < text_chunk_array.length; i++) {
fs.unlink(`${relative_path}_${i}.mp3`, (err) => {
if (err) {
console.log(err);
}
console.log(`AUDIO FILE ${i} REMOVED!`);
});
}
// delete the zip file
fs.unlink(`${relative_path}.zip`, (err) => {
if (err) {
console.log(err);
}
console.log(`ZIP FILE REMOVED!`);
});
});
});
// from archiver readme examples
archive.on('warning', function(err) {
if (err.code === 'ENOENT') {
// log warning
} else {
// throw error
throw err;
}
});
// from archiver readme examples
archive.on('error', function(err) {
throw err;
});
// from archiver readme examples
archive.finalize();
}
});
});
} catch (err) {
console.log("oh dear, there was an error: ");
console.log(err);
}
}
}
Learning Experiences
Other issues that came up during this process are documented below:
Long requests time out when using node (and resend the request)...
// solution
req.connection.setTimeout( 1000 * 60 * 10 ); // ten minutes
See: https://github.com/expressjs/express/issues/2512
400 errors caused by node max header size of 8KB (query string is included in header size)...
// solution (although probably not recommended - better to get text_string_array from server, rather than client)
node --max-http-header-size 80000 app.js
See: https://github.com/nodejs/node/issues/24692