Reading entire html file to String?

2019-01-14 00:35发布

Are there better ways to read an entire html file to a single string variable than:

    String content = "";
    try {
        BufferedReader in = new BufferedReader(new FileReader("mypage.html"));
        String str;
        while ((str = in.readLine()) != null) {
            content +=str;
        }
        in.close();
    } catch (IOException e) {
    }

标签: java file-io
7条回答
倾城 Initia
2楼-- · 2019-01-14 01:15

You should use a StringBuilder:

StringBuilder contentBuilder = new StringBuilder();
try {
    BufferedReader in = new BufferedReader(new FileReader("mypage.html"));
    String str;
    while ((str = in.readLine()) != null) {
        contentBuilder.append(str);
    }
    in.close();
} catch (IOException e) {
}
String content = contentBuilder.toString();
查看更多
smile是对你的礼貌
3楼-- · 2019-01-14 01:18

There's the IOUtils.toString(..) utility from Apache Commons.

If you're using Guava there's also Files.readLines(..) and Files.toString(..).

查看更多
可以哭但决不认输i
4楼-- · 2019-01-14 01:24

For string operations use StringBuilder or StringBuffer classes for accumulating string data blocks. Do not use += operations for string objects. String class is immutable and you will produce a large amount of string objects upon runtime and it will affect on performance.

Use .append() method of StringBuilder/StringBuffer class instance instead.

查看更多
欢心
5楼-- · 2019-01-14 01:24

As Jean mentioned, using a StringBuilder instead of += would be better. But if you're looking for something simpler, Guava, IOUtils, and Jsoup are all good options.

Example with Guava:

String content = Files.asCharSource(new File("/path/to/mypage.html"), StandardCharsets.UTF_8).read();

Example with IOUtils:

InputStream in = new URL("/path/to/mypage.html").openStream();
String content;

try {
   content = IOUtils.toString(in, StandardCharsets.UTF_8);
 } finally {
   IOUtils.closeQuietly(in);
 }

Example with Jsoup:

String content = Jsoup.parse(new File("/path/to/mypage.html"), "UTF-8").toString();

or

String content = Jsoup.parse(new File("/path/to/mypage.html"), "UTF-8").outerHtml();

NOTES:

Files.readLines() and Files.toString()

These are now deprecated as of Guava release version 22.0 (May 22, 2017). Files.asCharSource() should be used instead as seen in the example above. (version 22.0 release diffs)

IOUtils.toString(InputStream) and Charsets.UTF_8

Deprecated as of Apache Commons-IO version 2.5 (May 6, 2016). IOUtils.toString should now be passed the InputStream and the Charset as seen in the example above. Java 7's StandardCharsets should be used instead of Charsets as seen in the example above. (deprecated Charsets.UTF_8)

查看更多
仙女界的扛把子
6楼-- · 2019-01-14 01:29

You can use JSoup.
It's a very strong HTML parser for java

查看更多
Melony?
7楼-- · 2019-01-14 01:32

I prefers using Guava :


import com.google.common.base.Charsets;
import com.google.common.io.Files;
String content = Files.toString(new File("/path/to/file", Charsets.UTF_8)

查看更多
登录 后发表回答