I would like to create a HTML file for my report. The content in the report can be created either by using BufferedWriter#write(String)
File f = new File("source.htm");
BufferedWriter bw = new BufferedWriter(new FileWriter(f));
bw.write("Content");
or by using DataOutputStream#writeBytes(String)
File f = new File("source.htm");
DataOutputStream dosReport = new DataOutputStream(new FileOutputStream(f));
dosReport.wrtiteBytes("Content");
Is one of them better than the other? Why is it so?
If you're writing out text then you should use a
Writer
, which handles the conversion from unicode characters (Java's internal representation of strings) into an appropriate character encoding such as UTF-8.DataOutputStream.writeBytes
simply outputs the low-order eight bits of eachchar
in the string and ignores the high-order eight bits entirely - this is equivalent to UTF-8 for ASCII characters with codes below 128 (U+007F and below) but almost certainly wrong for anything beyond ASCII.Rather than a FileWriter, you should use an OutputStreamWriter so you can select a specific encoding (FileWriter always uses the platform's default encoding, which varies from platform to platform):
OutputStream:
This abstract class is the superclass of all classes representing an output stream of bytes. An output stream accepts output bytes and sends them to some sink.
Applications that need to define a subclass of
OutputStream
must always provide at least a method that writes one byte of output.For example:
BufferedWriter
Writes text to a character-output stream, buffering characters so as to provide for the efficient writing of single characters, arrays, and strings. The buffer size may be specified, or the default size may be accepted. The default is large enough for most purposes.
A
newLine()
method is provided, which uses the platform's own notion of line separator as defined by the system property line.separator. Not all platforms use the newline character ('\n') to terminate lines. Calling this method to terminate each output line is therefore preferred to writing a newline character directly.In general, a Writer sends its output immediately to the underlying character or byte stream. Unless prompt output is required, it is advisable to wrap a
BufferedWriter
around any Writer whosewrite()
operations may be costly, such asFileWriters
andOutputStreamWriters
.For example:
Firstly, the
DataOutputStream
in your 2nd example serves no useful purpose1. Indeed, if your Strings contain characters that don't fit into 8 bits, thewriteBytes(String)
method is going to mangle the text. Get rid of it. The Data streams are designed for reading and writing fine-grained binary data. For plain bytes, use a plain (or buffered) Input or Output stream.Secondly, in this specific use-case where you writing the entire output is a single write operation, the BufferedWriter doesn't add any value either.
So in this case. you should be comparing:
versus
To my mind, the first version looks simpler and cleaner. And it is best to use the
Reader
andWriter
stacks for text I/O ... because that's what they were designed for. (They take care of the encoding and decoding issues, cleanly and transparently.)You could benchmark them if you really need to know which is faster (on your system!) but I suspect there is not a lot of difference ... and that the first version is faster.
1 - I think DataOutputStream has buffering under the covers, but for this use-case, buffering does not help performance.
In use-cases where you are performing multiple (small) writes instead of on big one, there is a significant performance advantage in using a
BufferedWriter
(or aBufferedOutputStream
) instead of an unbuffered writer or stream.The other point is that both versions of your code is using the platform's default character encoding to encode the output file. It may be more appropriate to use a specific encoding independently of the default, or make this a configuration or command line parameter.