I need to scrape a website form (on-the-fly) which has AJAX and SESSIONS. I did a lot of research and I came across several possible solutions one being Python::Mechanize. I don't know python and cURL alone for PHP
(from my understanding) cannot handle AJAX or submit forms.
I found what i believe is the possible stack which can lead me to grace :). Problem is that I do not know how to use these packages at all.
I downloaded and installed NODEjs and I can call it from cmd. (great)
I downloaded and installed PhantomJS (Not sure how to setup the
PATH
so that it is dynamic so I have to manuallycd
in CMD to theDIR
to get it to load) How can I set this up in Windows 7? Not sure where to point the path.Downloaded CasperJS - put in the DIR
So on phantomjs I was able to run a test file which echos 'hello world' in the CMD prompt. And now I here no clue how to proceed. -Ultimatly i need this to run (on-the-fly) from my webserver - so it needs to be implemented into my webpage. As of now I would like to just run it from CMD and get it to go to a page, submit a form, scrape the results, and write it to a file.
Can someone please explain like a workflow of how I can accomplish this?
CasperJS -> shows this form example. and I would like to implement with my variables, run the script and save the result.
casper.start('http://some.tld/contact.form', function() {
this.fill('form#contact-form', {
'subject': 'I am watching you',
'content': 'So be careful.',
'civility': 'Mr',
'name': 'Chuck Norris',
'email': 'chuck@norris.com',
'cc': true,
'attachment': '/Users/chuck/roundhousekick.doc'
}, true);
});
casper.then(function() {
this.evaluateOrDie(function() {
return /message sent/.test(document.body.innerText);
}, 'sending message failed');
});
casper.run(function() {
this.echo('message sent').exit();
});