CasperJS - Access page's content while trying

2019-09-15 00:45发布

问题:

I'm trying some tests with casperjs and the certain situation here is:

  • extracting city names from a drop-down menu, (Already Done)

  • then select each city (with casper.fill()) which leads to load new contents and URL change on the page, (Successful while testing with a single city name, Failed with loop through the list of cities' names)

  • go one level further through new loaded items' links (new pages),

  • finally, grab the content from each single page

I was trying to do a loop to iterate through cities list and do all the work in each cycle. But the problem is CasperJs attempts to set <option> field value to each city one after another immediately and without executing the rest of the code inside the loop:

casper.then(function() {

    var citiesLength = cities.length;

    for (var i = 0; i < citiesLength; i++) {

        this.fill('form.wpv-filter-form',{   //setting drop-down field value to the city names in order of the items in the array
            'city[]': cityNames[i]
        });

// Apparently the code below (to the end of the loop) doesn't get executed
        casper.thenEvaluate(function() {

// Here the url change is being checked to know when the new content is loaded:
            var regexString = '(\\?)(city)(\\[\\])(=)(' + cityNames[i] + ')&';
            var regex = new RegExp(regexString, "igm");

            this.waitForUrl(regex, function(){
                var name = this.getHTML('.kw-details-title');
                link = this.evaluate(getFirstItemLink); // for test, just getting the first item's link

                casper.open(link).then(function(){
                    this.echo("New Page is loaded......");
                    // Grab the single item contents
                });
            });

        });
    }

This is the log (Shortened for 3 cities):

[debug] [remote] Set "city[]" field value to city1
[info] [remote] attempting to fetch form element from selector: 'form.wpv-filter-form'
[debug] [remote] Set "city[]" field value to city2
[info] [remote] attempting to fetch form element from selector: 'form.wpv-filter-form'
[debug] [remote] Set "city[]" field value to city3
[info] [remote] attempting to fetch form element from selector: 'form.wpv-filter-form'
[info] [remote] attempting to fetch form element from selector: 'form.wpv-filter-form'
[info] [remote] attempting to fetch form element from selector: 'form.wpv-filter-form'
[info] [phantom] Step anonymous 5/5: done in 123069ms.
[info] [phantom] Step _step 6/79 https ://domain.com/section/ (HTTP 200)
[info] [phantom] Step _step 6/79: done in 123078ms.

P.s: Is the usage of casper.open() a good way to reach second level pages (item pages)? Do I need to close them somehow after taking their content?

Thanks

回答1:

You have many issues in your code. Like not matching steps (then* and wait* functions) together which means that you mix direct invocation (casper.fill) with a step (thenEvaluate).

The other issue is that this doesn't refer to casper inside of the page context (inside evaluate and thenEvaluate).

This should work:

cityNames.forEach(function(cityName){
    casper.then(function(){
        this.fill('form.wpv-filter-form', {   //setting drop-down field value to the city names in order of the items in the array
            'city[]': cityName
        });
    });

    casper.then(function(){
        var regexString = '(\\?)(city)(\\[\\])(=)(' + cityName + ')&';
        var regex = new RegExp(regexString, "igm");
        this.waitForUrl(regex, function(){
            var name = this.getHTML('.kw-details-title');
            link = this.evaluate(getFirstItemLink); // for test, just getting the first item's link

            this.thenOpen(link).then(function(){
                this.echo("New Page is loaded......");
                // Grab the single item contents
            });
        });
    });
});


回答2:

It is hard to give you a precise answer because your problem is impossible to reproduce. However, I noted several problems in your script...

1. Avoid "nesting hell"

CasperJS is organized around steps. With this library, a script generally looks like this:

casper.start('http://www.website.com/');

casper.then(function () {
  // Step 1
});

casper.then(function () {
  // Step 2
});

casper.then(function () {
  // Step 3
});

casper.run();

then methods are not promises, but they have the same objective: flattening the code. So when you reach a certain level of nesting, you are obviously doing something wrong.

2. Be careful with evaluate

From the documentation:

The concept behind this method is probably the most difficult to understand when discovering CasperJS. As a reminder, think of the evaluate() method as a gate between the CasperJS environment and the one of the page you have opened; everytime you pass a closure to evaluate(), you’re entering the page and execute code as if you were using the browser console.

In your case, you are using this.evaluate() inside thenEvaluate(). I am sure this is not what you want to do...

3. this is not always what you expect

If we consider our first two points (nesting and evaluate), it appears that you are not using this the right way. When you are in the PhantomJS/CasperJS environment, this is your casper instance. But inside evaluate, you are in the page DOM environment, which means that this becomes window. If it's still not clear, here is an example script:

var casper = require('casper').create();

casper.start('http://casperjs.org/');

casper.then(function () {
  // "this" is "casper"
  console.log(this.getCurrentUrl()); // http://casperjs.org/
});

casper.then(function () {
  // "this" is "casper"
  this.echo(this.evaluate(function () {
    // "this" is "window"
    return this.location.href; // http://casperjs.org/
  }));
});

casper.run();