A little background... I'm a little new to javascript, and to phantom.js, so I don't know if this is a javascript or phantom.js bug (feature?).
The following completes successfully (sorry for the missing phantom.exit(), you'll just have to ctrl+c once you are done):
var page = require('webpage').create();
var comment = "Hello World";
page.viewportSize = { width: 800, height: 600 };
page.open("http://www.google.com", function (status) {
if (status !== 'success') {
console.log('Unable to load the address!');
phantom.exit();
} else {
page.includeJs('http://code.jquery.com/jquery-latest.min.js', function() {
console.log("1: ", comment);
}, comment);
var foo = page.evaluate(function() {
return arguments[0];
}, comment);
console.log("2: ", foo);
}
});
This works:
page.includeJs('http://code.jquery.com/jquery-latest.min.js', function() {
console.log("1: ", comment);
}, comment);
Output: 1: Hello World
But not:
page.includeJs('http://code.jquery.com/jquery-latest.min.js', function(c) {
console.log("1: ", c);
}, comment);
Output: 1: http://code.jquery.com/jquery-latest.min.js
And not:
page.includeJs('http://code.jquery.com/jquery-latest.min.js', function() {
console.log("1: ", arguments[0]);
}, comment);
Output: 1: http://code.jquery.com/jquery-latest.min.js
Looking at the 2nd piece, this works:
var foo = page.evaluate(function() {
return arguments[0];
}, comment);
console.log("2: ", foo);
Output: 2: Hello World
And this:
var foo = page.evaluate(function(c) {
return c;
}, comment);
console.log("2: ", foo);
Output: 2: Hello World
But not this:
var foo = page.evaluate(function() {
return comment;
}, comment);
console.log("2: ", foo);
Output:
ReferenceError: Can't find variable: comment
phantomjs://webpage.evaluate():2
phantomjs://webpage.evaluate():3
phantomjs://webpage.evaluate():3
2: null
The good news is, I know what works and what doesn't, but how about a little consistency?
Why the difference between includeJs
and evaluate
?
Which is the proper way to pass arguments to an anonymous function?
The tricky thing to understand with PhantomJS is that there are two execution contexts - the Phantom context, which is local to your machine and has access to the phantom
object and require
d modules, and the remote context, which exists within the window
of the headless browser and only has access to things loaded in webpages you load via page.load
.
Most of the script you write is executed in the Phantom context. The main exception is anything within page.evaluate(function() { ... })
. The ...
here is executed in the remote context, which is sandboxed, without access to the variables and objects in your local context. You can move data between the two contexts by:
- Returning a value from the function passed to
page.evaluate()
, or
- Passing arguments in to that function.
The values thus passed are essentially serialized in each direction - you can't pass a complex object with methods, only a data object like a string or an array (I don't know the exact implementation, but the rule of thumb seems to be that anything you can serialize with JSON can be passed in either direction). You do not have access to variables outside the page.evaluate()
function, as you would with standard Javascript, only to variables you explicitly pass in as arguments.
So, your question: Why the difference between includeJs and evaluate?
.includeJs(url, callback)
takes a callback function that executes within the Phantom context, apparently receiving the url as its first argument. In addition to its arguments, it has access (like any normal JavaScript function) to all variables in its enclosing scope, including comment
in your example. It does not take an additional argument list after the callback function - when you reference comment
within the callback, you're referencing an outside variable, not a function argument.
var foo = "stuff";
page.includeJs('http://code.jquery.com/jquery-latest.min.js', function() {
// this callback function executes in the Phantom context
console.log("jQuery is loaded in the remote context.");
// it has access to outer-scope variables, including "phantom"
nowDoMoreStuff(foo, page);
});
.evaluate(function, args*)
takes a function to execute and zero or more arguments to pass to it (in some serialized form). You need to name the arguments in the function signature, e.g. function(a,b,c)
, or use the arguments
object to access them - they won't automagically have the same names as the variables you pass in.
var foo = "stuff";
var bar = "stuff for the remote page";
var result = page.evaluate(function(bar2) {
// this function executes in the remote context
// it has access to the DOM, remote libraries, and args you pass in
$('title').html(bar2);
// but not to outer-scope vars
return typeof foo + " " + typeof bar;
}, bar);
console.log(result); // "undefined undefined"
So the correct way to pass arguments in is different for the functions in these different methods. For injectJs
, the callback will be called with a new set of arguments (including, at least, the URL), so any variables you want to access need to be in the callback's enclosing scope (i.e. you have access to them within the function's closure). For evaluate
, there is only one way to pass in arguments, which is to include them in the arguments passed to evaluate
itself (there are other ways, too, but they're tricky and not worth discussing now that this feature is available in PhantomJS itself).