Does the .pipe() perform a memcpy in node.js?

2019-03-26 13:59发布

问题:

This is a conceptual query regarding system level optimisation. My understanding by reading the NodeJS Documentation is that pipes are handy to perform flow control on streams.

Background: I have microphone stream coming in and I wanted to avoid an extra copy operation to conserve overall system MIPS. I understand that for audio streams this is not a great deal of MIPS being spent even if there was a memcopy under the hood, but I also have an extension planned to stream in camera frames at 30fps and UHD resolution. Making multiple copies of UHD resolution pixel data at 30fps is super inefficient, so needed some advice around this.

Example Code:

var spawn = require('child_process').spawn
var PassThrough = require('stream').PassThrough;

var ps = null;
//var audioStream = new PassThrough;
//var infoStream = new PassThrough;

var start = function() {
    if(ps == null) {
        ps = spawn('rec', ['-b', 16, '--endian', 'little', '-c', 1, '-r', 16000, '-e', 'signed-integer', '-t', 'raw', '-']);
        //ps.stdout.pipe(audioStream);
        //ps.stderr.pipe(infoStream);
        exports.audioStream = ps.stdout;
        exports.infoStream = ps.stderr;
    }
};

var stop = function() {
    if(ps) {
        ps.kill();
        ps = null;
    }
};

//exports.audioStream = audioStream;
//exports.infoStream = infoStream;
exports.startCapture = start;
exports.stopCapture = stop;

Here are the questions:

  1. To be able to perform flow control, does the source.pipe(dest) perform a memcpy from the source memory to the destination memory under the hood OR would it pass the reference in memory to the destination?
  2. The commented code contains a PassThrough class instantiation - I am currently assuming the PassThrough causes memcopies as well, and so I am saving one memcpy operation in the entire system because I added in the above comments?
  3. If I had to create a pipe between a Process and a Spawned Child process (using child_process.spawn() as shown in How to transfer/stream big data from/to child processes in node.js without using the blocking stdio?), I presume that definitely results in memcpy? Is there anyway to make that a reference rather than copy?
  4. Does this behaviour differ from OS to OS? I presume it should be OS agnostic, but asking this anyways.

Thanks in advance for your help. It will help my architecture a great deal.

回答1:

some url's for reference: https://github.com/nodejs/node/
https://github.com/nodejs/node/blob/master/src/stream_wrap.cc
https://github.com/nodejs/node/blob/master/src/stream_base.cc
https://github.com/libuv/libuv/blob/v1.x/src/unix/stream.c
https://github.com/libuv/libuv/blob/v1.x/src/win/stream.c

i tried writing a complicated / huge explaination based on theese and some other files however i came to the conclusion it would be best to give you a summary of how my experience / reading tells me node internally works:

pipe simply connects streams making it appear as if .on("data", …) is called by .write(…) without anything bloated in between.

now we need to separate the js world from the c++ / c world.
when dealing with data in js we use buffers. https://github.com/nodejs/node/blob/master/src/node_buffer.cc
they simply represent allocated memory with some candy on top to operate with it.

if you connect stdout of a process to some .on("data", …) listener it will copy the incoming chunk into a Buffer object for further usage inside the js world.
inside the js world you have methods like .pause() etc. (as you can see in nodes steam api documentation) to prevent the process to eat memory in case incoming data flows faster than its processed.

connecting stdout of a process and for example an outgoing tcp port through pipe will result in a connection similar to how nginx operates. it will connect theese streams as if they would directly talk to each other by copying incoming data directly to the outgoing stream.

as soon as you pause a stream, node will use internal buffering in case its unable to pause the incoming stream.

so for your scenario you should just do testing.
try to receive data through an incoming stream in node, pause the stream and see what happens.
i'm not sure if node will use internal buffering or if the process you try to run will just halt untill it can continue to send data.
i expect the process to halt untill you continue the stream.

for transfering huge images i recommend transfering them in chunks or to pipe them directly to an outgoing port.

the chunk way would allow you to send the data to multiple clients at once and would keep the memory footprint pretty low.

PS you should take a look at this gist that i just found: https://gist.github.com/joyrexus/10026630
it explains in depth how you can interact with streams