I'm trying to pipe an input stream (created from a huge GeoJSON file) through JSONStream.parse() to break the stream into objects, then through event-stream.map() to allow me to transform the object, then through JSONStream.stringify() to create a string out of it, and finally to a writable output stream. As the process runs, I can see node's memory footprint continue to grow until it eventually exhausts heap. Here's the simplest script (test.js) that recreates the problem:
const fs = require("fs")
const es = require("event-stream")
const js = require("JSONStream")
out = fs.createWriteStream("/dev/null")
process.stdin
.pipe(js.parse("features.*"))
.pipe(es.map( function(data, cb) {
cb(null, data);
return;
} ))
.pipe(js.stringify("{\n\"type\": \"FeatureCollection\", \"features\": [\n\t", ",\n\t", "\n]\n}"))
.pipe(out)
A little bash script (barf.sh) that spews an endless stream of JSON into node's process.stdin will cause node's heap to gradually grow:
#!/bin/bash
echo '{"type":"FeatureCollection","features":['
while :
do
echo '{"type":"Feature","properties":{"name":"A Street"}, "geometry":{"type":"LineString"} },'
done
by running it as so:
barf.sh | node test.js
There are a couple of curious ways to sidestep the issue:
- Remove the fs.createWriteStream() and change the last pipe stage from ".pipe(out)" to ".pipe(process.stdout)" and then pipe node's stdout to /dev/null
- Change the asynchronous es.map() to the synchronous es.mapSync()
Either one of the preceding two actions will allow the script to run forever, with node's memory footprint low and unchanging. I'm using node v6.3.1, event-stream v3.3.4, and JSONStream 1.1.4 on an eight core machine with 8GB of RAM running Ubuntu 16.04.
I hope someone can help me correct what I'm sure is an obvious error on my part.