Get page content from URL?

2019-07-21 03:31发布

问题:

I want to get content of page from URL by this code :

public static String getContentResult(URL url) throws IOException{

    InputStream in = url.openStream();
    StringBuffer sb = new StringBuffer();

    byte [] buffer = new byte[256];

    while(true){
        int byteRead = in.read(buffer);
        if(byteRead == -1)
            break;
        for(int i = 0; i < byteRead; i++){
            sb.append((char)buffer[i]);
        }
    }
    return sb.toString();
}

But with this URL : http://portal.acm.org/citation.cfm?id=152610.152611&coll=DL&dl=GUIDE&CFID=114782066&CFTOKEN=85539315 i can't get Asbtract :Database management systems will continue to manage.....

Can you give me solution for solve problem ? Thanks in advance

回答1:

Outputting the header of of the get request:

HTTP/1.1 302 Moved Temporarily
Connection: close
Date: Thu, 18 Nov 2010 15:35:24 GMT
Server: Microsoft-IIS/6.0
location: http://portal.acm.org/citation.cfm?id=152610.152611&coll=DL&dl=GUIDE
Content-Type: text/html; charset=UTF-8

This means that the server wants you to download the new locations address. So either you get the header directly from the UrlConnection and follow that link or you use HttpClient automatically which automatically follow redirects. The code below is based on HttpClient:

public class HttpTest {
    public static void main(String... args) throws Exception {

        System.out.println(readPage(new URL("http://portal.acm.org/citation.cfm?id=152610.152611&coll=DL&dl=GUIDE&CFID=114782066&CFTOKEN=85539315")));
    }

    private static String readPage(URL url) throws Exception {

        DefaultHttpClient client = new DefaultHttpClient();
        HttpGet request = new HttpGet(url.toURI());
        HttpResponse response = client.execute(request);

        Reader reader = null;
        try {
            reader = new InputStreamReader(response.getEntity().getContent());

            StringBuffer sb = new StringBuffer();
            {
                int read;
                char[] cbuf = new char[1024];
                while ((read = reader.read(cbuf)) != -1)
                    sb.append(cbuf, 0, read);
            }

            return sb.toString();

        } finally {
            if (reader != null) {
                try {
                    reader.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
    }
}


回答2:

There's no "Database management..." on given url. Perhaps, it's loaded by javascript dynamically. You'll need to have more sophisticated application to download such content ;)



回答3:

The content you're looking for is not included in this URL. Open your browser and view the source code. Instead many javascript files are loaded. I think the content is fetched later by AJAX calls. You would need to learn how the content is loaded.

The Firfox Plugin Firebug could be helpful for a more detaild analyse.



回答4:

The url that you should be using is:

http://portal.acm.org/citation.cfm?id=152610.152611&coll=DL&dl=GUIDE

Because the original url you posted (as mentioned by dacwe) sends redirect.



标签: java url