Saving a web page to a file in Java [closed]

2019-09-22 05:30发布

I am trying to read html site using below code,System hanging any hints please:

package com.test;

import java.io.BufferedWriter;   
import java.io.FileWriter;   
import java.net.Socket;  
import javax.net.SocketFactory;  
import java.net.InetAddress;

public class writingFile {

    public static void main(String a[]) throws Exception {

        SocketFactory factory=SocketFactory.getDefault();
        Socket socket=new Socket(InetAddress.getByName("java.sun.com"), 80);
        BufferedWriter out=new BufferedWriter(new FileWriter("C://test.html"));
        int data;

        while((data=socket.getInputStream().read()) != -1) {
            out.write((char)data);
            out.flush();
        }
    }
}

Regards, Raj

2条回答
Bombasti
2楼-- · 2019-09-22 06:09

First of all you need to realize that web pages are served over HTTP not raw TCP. If you really want to use a Socket you're going to have to implement an HTTP GET request yourself. I'll leave it up to you to figure that out if you so desire.

Alternatively you could use Java's built in URLconnection. Please note the code below is far from production ready, but this should give you a general idea of how to use the Java URL connection.

public class WebPageSaver {
    public static void main(String args[]) throws Exception {
        OutputStream out = new FileOutputStream("c:/temp/test.html");

        URL url = new URL("http://www.oracle.com/technetwork/java/index.html");
        URLConnection conn = url.openConnection();
        conn.connect();
        InputStream is = conn.getInputStream();

        copy(is, out);
        is.close();
        out.close();
    }

    private static void copy(InputStream from, OutputStream to) throws IOException {
        byte[] buffer = new byte[4096];
        while (true) {
            int numBytes = from.read(buffer);
            if (numBytes == -1) {
                break;
            }
            to.write(buffer, 0, numBytes);
        }
    }
}
查看更多
Ridiculous、
3楼-- · 2019-09-22 06:21

This is HTTP. You can't just open a socket and start reading something. You have to be polite to the server and send a request first:

socket.getOutputStream().write("GET /index.html HTTP/1.0\n\n".getBytes());
socket.getOutputStream().flush();

Then read a HTTP response, parse it, and get your html page back.

EDIT I wrote what to do with sockets only because it was the immediate problem of the OP. Using URLConnection is the correct way, as answered by @Mike Deck.

查看更多
登录 后发表回答