可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
Currently I'm working on a class that can be used to read the contents of the website specified by the url. I'm just beginning my adventures with java.io
and java.net
so I need to consult my design.
Usage:
TextURL url = new TextURL(urlString);
String contents = url.read();
My code:
package pl.maciejziarko.util;
import java.io.*;
import java.net.*;
public final class TextURL
{
private static final int BUFFER_SIZE = 1024 * 10;
private static final int ZERO = 0;
private final byte[] dataBuffer = new byte[BUFFER_SIZE];
private final URL urlObject;
public TextURL(String urlString) throws MalformedURLException
{
this.urlObject = new URL(urlString);
}
public String read()
{
final StringBuilder sb = new StringBuilder();
try
{
final BufferedInputStream in =
new BufferedInputStream(urlObject.openStream());
int bytesRead = ZERO;
while ((bytesRead = in.read(dataBuffer, ZERO, BUFFER_SIZE)) >= ZERO)
{
sb.append(new String(dataBuffer, ZERO, bytesRead));
}
}
catch (UnknownHostException e)
{
return null;
}
catch (IOException e)
{
return null;
}
return sb.toString();
}
//Usage:
public static void main(String[] args)
{
try
{
TextURL url = new TextURL("http://www.flickr.com/explore/interesting/7days/");
String contents = url.read();
if (contents != null)
System.out.println(contents);
else
System.out.println("ERROR!");
}
catch (MalformedURLException e)
{
System.out.println("Check you the url!");
}
}
}
My question is:
Is it a good way to achieve what I want? Are there any better solutions?
I particularly didn't like sb.append(new String(dataBuffer, ZERO, bytesRead));
but I wasn't able to express it in a different way. Is it good to create a new String every iteration? I suppose no.
Any other weak points?
Thanks in advance!
回答1:
Consider using URLConnection
instead. Furthermore you might want to leverage IOUtils
from Apache Commons IO to make the string reading easier too. For example:
URL url = new URL("http://www.example.com/");
URLConnection con = url.openConnection();
InputStream in = con.getInputStream();
String encoding = con.getContentEncoding(); // ** WRONG: should use "con.getContentType()" instead but it returns something like "text/html; charset=UTF-8" so this value must be parsed to extract the actual encoding
encoding = encoding == null ? "UTF-8" : encoding;
String body = IOUtils.toString(in, encoding);
System.out.println(body);
If you don't want to use IOUtils
I'd probably rewrite that line above something like:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
byte[] buf = new byte[8192];
int len = 0;
while ((len = in.read(buf)) != -1) {
baos.write(buf, 0, len);
}
String body = new String(baos.toByteArray(), encoding);
回答2:
I highly recommend using a dedicated library, like HtmlParser:
Parser parser = new Parser (url);
NodeList list = parser.parse (null);
System.out.println (list.toHtml ());
Writing your own html parser is such a loose of time. Here is its maven dependency. Look at its JavaDoc for digging into its features.
Looking at the following sample should be convincing:
Parser parser = new Parser(url);
NodeList movies = parser.extractAllNodesThatMatch(
new AndFilter(new TagNameFilter("div"),
new HasAttributeFilter("class", "movie")));
回答3:
Unless this is some sort of exercise that you want to code for the sake of learning ... I would not reinvent the wheel and I would use HttpURLConnection.
HttpURLConnection
provides good encapsulation mechanisms to deal with the HTTP protocol. For instance, your code doesn't work with HTTP redirections, HttpURLConnection
would fix that for you.
回答4:
You can wrap your InputStream
in a InputStreamReader
and can use it's read()
method to read character data directly (note that you should specify the encoding when creating the Reader
, but finding out the encoding of arbitrary URLs is non-trivial). Then simply call sb.append()
with the char[]
you just read (and the correct offset and length).
回答5:
Hey Please use these lines of codes , it will help u..
<!DOCTYPE html>
<html>
<head>
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>JSP Page</title>
</head>
<body>
<h1>Hello World!</h1>
URL uri= new URL("Your url");
URLConnection ec = uri.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(
ec.getInputStream(), "UTF-8"));
String inputLine;
StringBuilder a = new StringBuilder();
while ((inputLine = in.readLine()) != null)
a.append(inputLine);
in.close();
out.println(a.toString());
回答6:
I know this is an old question, but I'm sure other people will find it too.
If you don't mind an additional dependency, here's a very simple way
Jsoup.connect("http://example.com/").get().toString()
You'll need a Jsoup library, but you can quickly add it with maven/gradle and it also allows to manipulate the contents of the page and find specific nodes.