I have some (very simplified) nodejs code here:
var fs = require('fs');
var derpfile = String(fs.readFileSync( './derp.txt', 'utf-8' ));
var derps = derpfile.split( '\n' );
for (var i = 0; i < derps.length; ++i) {
// do something with my derps here
}
The problem is, I cannot use node in Pig UDF's (that I am aware of; if I can do this, please let me know!). When I look at 'file io' in javascript, all the tutorials I see are in re the browser sandbox. I need to read a file off the filesystem, like hdfs:///foo/bar/baz/jane/derps.txt
, which I cannot guarantee will be in the CWD, but which I will have permissions to get at. All these tutorials also seem to be involving asynchronous reads. I really need to have a blocking call here, as the pig job cannot begin until this file is read. There are also lots of explanations of how to pull down a URL from another site.
This is kind of incredibly frustrating as using Java for this task is horrific overkill, and javascript is really The Right Tool For The Job (well, okay, perl is, but I don't get to choose that…), and I'm hamstrung on something as simple as basic file IO. :(