How to use NodeJS / PhantomJS / CasperJS on Window

2020-07-27 06:03发布

问题:

I need to scrape a website form (on-the-fly) which has AJAX and SESSIONS. I did a lot of research and I came across several possible solutions one being Python::Mechanize. I don't know python and cURL alone for PHP (from my understanding) cannot handle AJAX or submit forms.

I found what i believe is the possible stack which can lead me to grace :). Problem is that I do not know how to use these packages at all.

  1. I downloaded and installed NODEjs and I can call it from cmd. (great)

  2. I downloaded and installed PhantomJS (Not sure how to setup the PATH so that it is dynamic so I have to manually cd in CMD to the DIR to get it to load) How can I set this up in Windows 7? Not sure where to point the path.

  3. Downloaded CasperJS - put in the DIR

So on phantomjs I was able to run a test file which echos 'hello world' in the CMD prompt. And now I here no clue how to proceed. -Ultimatly i need this to run (on-the-fly) from my webserver - so it needs to be implemented into my webpage. As of now I would like to just run it from CMD and get it to go to a page, submit a form, scrape the results, and write it to a file.

Can someone please explain like a workflow of how I can accomplish this?

CasperJS -> shows this form example. and I would like to implement with my variables, run the script and save the result.

casper.start('http://some.tld/contact.form', function() {
    this.fill('form#contact-form', {
        'subject':    'I am watching you',
        'content':    'So be careful.',
        'civility':   'Mr',
        'name':       'Chuck Norris',
        'email':      'chuck@norris.com',
        'cc':         true,
        'attachment': '/Users/chuck/roundhousekick.doc'
    }, true);
});

casper.then(function() {
    this.evaluateOrDie(function() {
        return /message sent/.test(document.body.innerText);
    }, 'sending message failed');
});

casper.run(function() {
    this.echo('message sent').exit();
});

回答1:

After you install PhantomJS do next:

  1. From the Desktop, right-click My Computer and click Properties.
  2. Click Advanced System Settings link in the left column.
  3. In the System Properties window click the Environment Variables button.
  4. Find PATH variable and click Edit
  5. Add PhantomJS path at the end of the variable value (don't forget ; before it)

For now you can use phantomjs from your CMD. Ex.: phantomjs c:\mywebsite\with\ajax\dopescript.js

After these steps download CasperJS and put it in PhantomJS folder

Ex.: c:\phantomjs\casperjs

Do previous steps for PATH variable for CasperJS (plus \bin at the end)

Ex.: c:\phantomjs\casperjs\bin

Try casperjs from CMD.

If it's not working go to batchbin directory in casperjs folder and lunch casperjs.bat

Now try to call CasperJs from this folder. (Works for me)

So for now you should have PhantomJS + CasperJS.

About saving results:

Put this var fs = require('fs'); at the beginning of your script and call

fs.write('result.html', myData); where myData is data that you need to save.

Here is more information about FS: PhantomJS File System