Writing to filesystem from within phantomjs sandbo

2019-07-20 06:15发布

问题:

I need to traverse forms on a site and save intermediate results to files. I'm using phantomjs' page.evaluate, but I'm having trouble accessing the filesystem from within page.evaluate's sandboxed environment. I have something like this:

for (var i = 0; i<option1.length; i++){
    for (var ii = 0; ii<option2.length; ii++){
        for (var iii = 0; iii<option3.length; iii++){
        ...
            //I found what I want to save
            fs.write("someFileName", someData);
        }
    }
}

Obviously, I don't have access to nodejs' fs from within page.evaluate, so the above does not work. I seem to have a few options:

  • Store everything I need to write to an array, and return that from the page.evaluate context into the outer, nodejs context, then save it from there. This would require memory I don't have.
  • Break up the above logic into smaller page.evaluate methods that return singe pieces of data to save to the filesytem.
  • Somehow pass into the page.evaluate a magic function to write to the filesystem. This seems to not be possible (if I try to pass in a function that calls fs.writeFile for example, I get that fs is undefined, even if fs is a free variable in the function I passed?)
  • Return an iterator which, when pulled, yields the next piece of data to be written
  • Setup a trivial web server on the localhost that simply accepts POST requests and writes their contents into the filesystem. The page.evaluate code would then make those requests to the localhost. I almost try this but I'm not sure I'll be affected by the same-origin policy.

What are my options here?

回答1:

Your evaluation is sound, but you forgot one type: onCallback. You can register to the event handler in the phantom context and push your data from page context to a file through this callback:

page.onCallback = function(data) {
    if (!data.file) {
        data.file = "defaultFilename.txt";
    }
    if (!data.mode) {
        data.mode = "w";
    }
    fs.write(data.file, data.str, data.mode);
};

...
page.evaluate(function(){
    for (var i = 0; i<option1.length; i++){
        for (var ii = 0; ii<option2.length; ii++){
            for (var iii = 0; iii<option3.length; iii++){
            ...
                // save data
                if (typeof window.callPhantom === 'function') {
                    window.callPhantom({ file: "someFileName", str: someData, mode: "a" }); // append
                }
            }
        }
    }
});

Note that PhantomJS does not run in Node.js. Although, there are bridges between Node.js and PhantomJS. See also my answer here.