I want determine the size of a web page, and so, if it is greater than a number (eg.:5MB), I will download it or not. Can I have this information?
问题:
回答1:
You can do a decent approximation with:
HttpURLConnection content = (HttpURLConnection) new URL("www.example.com").openConnection();
System.out.println(content.getContentLength());
However, this will only tell you the length of the specific resource you're requesting (e.g. the HTML at the base of the URL). You will also need to go through the HTML in the page, look at all the resources that it references (scripts from other sites, images, video, etc.) and total them all up.
That will get you fairly close to a total size, but even then you won't get a perfect count, because (a) not all URLs are going to return this information and you don't have any control over that, and (b) depending on how the content is loaded (such as through AJAX calls that hide the specifics) you won't be able to know ahead of time the complete list of resources to be downloaded.
Alternatively, if a URL doesn't return a result, I think Giacomo was suggesting the use of a CounterInputStream. Not a bad idea. You could maybe combine the above suggestion with the CounterInputStream to keep a count of the total that has been sent, and potentially stop the transfer when it reaches a specified maximum total transfer size. That way you'd essentially have a predicted size (say a site tells you it's going to be 3.3 MB), but as you're downloading you find out that it's actually 6 MB and hasn't stopped yet, and make the decision to not download anymore than that.
回答2:
I may be wrong however can't you just use
HttpURLConnection conn = (HttpURLConnection) new URL("http://www.google.com").openConnection();
System.out.println(conn.getContentLength());
?