I'm trying to use X-Ray to do the following, i'm not famliar with web scarping, and i'm looking for a technolegy to fit my use.
Browse to a page, allocate a specific form in it, set some vars, and submit it.
Then get the other page, and so on...
What's the best NodeJS based solution with examples and documents to get this done?
Thanks.
There are many Node modules created for web scraping.
Some of them are:
- cheerio
- osmosis
- x-ray
- noodlejs
- yakuza
- ineed
See Node.js Scraping Libraries - a very nice comparison by Moritz Klack on Webkid Blog.
There are some nice articles online on how to use some of them, mostly about Cheerio:
- Web Scraping With Node.js by Elliot Bonneville (Smashing Magazine) about Cheerio
- Scraping the Web With Node.js by Adnan Kukic (Scotch.io) about Cheerio
- Easy Web Scraping With Node.js by Miguel Grinberg about Cheerio
- Simple web scraping with Node.js / JavaScript by Stephen ('Net Instructions) about Cheerio
It's worth mentioning that the x-ray module was written by the author of Cheerio - see: X-Ray: A Scraper by the Author of Cheerio on DailyJS.