Get page content from URL?

2019-07-21 02:58发布

I want to get content of page from URL by this code :

public static String getContentResult(URL url) throws IOException{

    InputStream in = url.openStream();
    StringBuffer sb = new StringBuffer();

    byte [] buffer = new byte[256];

    while(true){
        int byteRead = in.read(buffer);
        if(byteRead == -1)
            break;
        for(int i = 0; i < byteRead; i++){
            sb.append((char)buffer[i]);
        }
    }
    return sb.toString();
}

But with this URL : http://portal.acm.org/citation.cfm?id=152610.152611&coll=DL&dl=GUIDE&CFID=114782066&CFTOKEN=85539315 i can't get Asbtract :Database management systems will continue to manage.....

Can you give me solution for solve problem ? Thanks in advance

标签: java url
4条回答
对你真心纯属浪费
2楼-- · 2019-07-21 03:13

The url that you should be using is:

http://portal.acm.org/citation.cfm?id=152610.152611&coll=DL&dl=GUIDE

Because the original url you posted (as mentioned by dacwe) sends redirect.

查看更多
神经病院院长
3楼-- · 2019-07-21 03:16

There's no "Database management..." on given url. Perhaps, it's loaded by javascript dynamically. You'll need to have more sophisticated application to download such content ;)

查看更多
爱情/是我丢掉的垃圾
4楼-- · 2019-07-21 03:19

Outputting the header of of the get request:

HTTP/1.1 302 Moved Temporarily
Connection: close
Date: Thu, 18 Nov 2010 15:35:24 GMT
Server: Microsoft-IIS/6.0
location: http://portal.acm.org/citation.cfm?id=152610.152611&coll=DL&dl=GUIDE
Content-Type: text/html; charset=UTF-8

This means that the server wants you to download the new locations address. So either you get the header directly from the UrlConnection and follow that link or you use HttpClient automatically which automatically follow redirects. The code below is based on HttpClient:

public class HttpTest {
    public static void main(String... args) throws Exception {

        System.out.println(readPage(new URL("http://portal.acm.org/citation.cfm?id=152610.152611&coll=DL&dl=GUIDE&CFID=114782066&CFTOKEN=85539315")));
    }

    private static String readPage(URL url) throws Exception {

        DefaultHttpClient client = new DefaultHttpClient();
        HttpGet request = new HttpGet(url.toURI());
        HttpResponse response = client.execute(request);

        Reader reader = null;
        try {
            reader = new InputStreamReader(response.getEntity().getContent());

            StringBuffer sb = new StringBuffer();
            {
                int read;
                char[] cbuf = new char[1024];
                while ((read = reader.read(cbuf)) != -1)
                    sb.append(cbuf, 0, read);
            }

            return sb.toString();

        } finally {
            if (reader != null) {
                try {
                    reader.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
    }
}
查看更多
聊天终结者
5楼-- · 2019-07-21 03:23

The content you're looking for is not included in this URL. Open your browser and view the source code. Instead many javascript files are loaded. I think the content is fetched later by AJAX calls. You would need to learn how the content is loaded.

The Firfox Plugin Firebug could be helpful for a more detaild analyse.

查看更多
登录 后发表回答