I would like to be able to fetch a web page's html and save it to a String
, so I can do some processing on it. Also, how could I handle various types of compression.
How would I go about doing that using Java?
I would like to be able to fetch a web page's html and save it to a String
, so I can do some processing on it. Also, how could I handle various types of compression.
How would I go about doing that using Java?
Here's some tested code using Java's URL class. I'd recommend do a better job than I do here of handling the exceptions or passing them up the call stack, though.
All the above mentioned approaches do not download the web page text as it looks in the browser. these days a lot of data is loaded into browsers through scripts in html pages. none of above mentioned techniques supports scripts, they just downloads the html text only. HTMLUNIT supports the javascripts. so if you are looking to download the web page text as it looks in the browser then you should use HTMLUNIT.
Get help from this class it get code and filter some information.
Try using the jsoup library.
You can download the jsoup library here.
Well, you could go with the built-in libraries such as URL and URLConnection, but they don't give very much control.
Personally I'd go with the Apache HTTPClient library.Edit: HTTPClient has been set to end of life by Apache. The replacement is: HTTP Components