I am working on web crawler which fetch data form website using crawler4j and everything goes well but the main problem is with ajax-based events . So, I found crawljax library does this matter but I couldn't where and when to use it .
When have I use it ( I mean work sequences )?
- before fetching page using crawler4j.
Or
- after fetching page using crawler4j.
Or
- have I use url coming using crawler4j and use it to fetch Ajax data (page) using crawljax.
The library crawljax is basically a crawler for its own purpose. Integration into
crawler4j
requires a lot of manual effort on your side.I recommend, that you use a combination of Selenium and/or CasperJS and/or PhantomJS in front of
crawler4j
, i.e. you could run the JavaScript engine as a Proxy in front ofcrawler4j
. However, this will slow down the performance of your web-crawleer