Are there better ways to read an entire html file to a single string variable than:
String content = "";
try {
BufferedReader in = new BufferedReader(new FileReader("mypage.html"));
String str;
while ((str = in.readLine()) != null) {
content +=str;
}
in.close();
} catch (IOException e) {
}
You should use a StringBuilder:
There's the
IOUtils.toString(..)
utility from Apache Commons.If you're using
Guava
there's alsoFiles.readLines(..)
andFiles.toString(..)
.For string operations use StringBuilder or StringBuffer classes for accumulating string data blocks. Do not use
+=
operations for string objects.String
class is immutable and you will produce a large amount of string objects upon runtime and it will affect on performance.Use
.append()
method of StringBuilder/StringBuffer class instance instead.As Jean mentioned, using a
StringBuilder
instead of+=
would be better. But if you're looking for something simpler, Guava, IOUtils, and Jsoup are all good options.Example with Guava:
Example with IOUtils:
Example with Jsoup:
or
NOTES:
These are now deprecated as of Guava release version 22.0 (May 22, 2017).
Files.asCharSource()
should be used instead as seen in the example above. (version 22.0 release diffs)Deprecated as of Apache Commons-IO version 2.5 (May 6, 2016).
IOUtils.toString
should now be passed theInputStream
and theCharset
as seen in the example above. Java 7'sStandardCharsets
should be used instead ofCharsets
as seen in the example above. (deprecated Charsets.UTF_8)You can use JSoup.
It's a very strong
HTML parser
for javaI prefers using Guava :