I need to code a bot that needs to do the following:
Go to a jsp page and
search for something by:
- 1: writing something on a search box
- 2: clicking the search button(submit button)
- 3: clicking one of the the resulting buttons/links(same jsp page with different output)
- 4: get the entire html of the new page(same jsp page with different output)
The 4th one can be done with screen scraping and I do not think I need help with it. But I need some guidance to do the options from 1 to 3. Any links or just some keyword that will help me google to learn about it will be appreciated. I plan to do this with java.
Maybe it's not what you want, but you can try selenium : http://seleniumhq.org/
It's a web application testing system.
All you need is HTMLUnit
This is an extract from its description
HtmlUnit is a "GUI-Less browser for Java programs". It models HTML documents and provides an API that allows you to invoke pages, fill out forms, click links, etc... just like you do in your "normal" browser.
P.S.: Had used it to build a web scraping project ;)
You can use python-mechanize for this.
Prerequistes:
- Selenium API.
- Mozilla Firefox (with firebug extension installed)
We can achieve launching of a browser,go to the particular web page ,search a keyword and analyse results by doing following
- Launch web browser(driver.launch()(selenium)
- Go to the particular webpage(driver.get("your web pager"))(selenium)
- Identify the search box(get identifier by using fire bug(id,xml path.. etc)
- Go to that box and write your search keyword (webelement.sendkeys("your keyword") and click on search button (webelement.click())(selenium)
- Click on desired result by using identifiers and for next web page to load (selenium)
I used selenium in chrome. If you want to use selenium you have to download from http://www.seleniumhq.org/download/ --- the latest version and implement in neatbeans or eclipse the jar files. (Selenium Client & WebDriver Language Bindings, Selenium Standalone Server) After this you have to download from google https://sites.google.com/a/chromium.org/chromedriver/ -- chrome driver also the latest version extract the file and save on your pc.