I am starting a new project of crawling websites to retrieve and store data internally using a web service. I looked up some information and came across Scrapy and Beevolve web crawling services.
My question is is it best to just create my own crawler with no prior experience or rent a web crawling service?
One issue that I came across is, some of the websites require a log in before getting any data.
If you want to create your own web crawler in Java you may want to look at this
You could also take a look at jSpider and jsoup.
Edit : This could work too : crawler4j