Reading some sources about Java file I/O managing, I get to know that there are more than 1 alternative for input and output operations.
These are:
BufferedReader
and BufferedWriter
FileReader
and FileWriter
FileInputStream
and FileOutputStream
InputStreamReader
and OutputStreamWriter
Scanner
class
What of these is best alternative for text files managing? What's best alternative for serialization? What does Java NIO say about it?
Two kinds of data
Generally speaking there are two "worlds":
When it's a file (or a socket, or a BLOB in a DB, or ...), then it's always binary data first.
Some of that binary data can be treated as text data (which involves something called an "encoding" or "character encoding").
Binary Data
Whenever you want to handle the binary data then you need to use the InputStream
/OutputStream
classes (generally, everything that contains Stream
in its name).
That's why there's a FileInputStream
and a FileOutputStream
: those read from and write to files and they handle binary data.
Text Data
Whenever you want to handle text data, then you need to use the Reader
/Writer
classes.
Whenever you need to convert binary data to text (or vice versa), then you need some kind of encoding (common ones are UTF-8, UTF-16, ISO-8859-1 (and related ones) and the good old US-ASCII). "Luckily" the Java platform also has something called the "default platform encoding" which it will use whenever it needs one but the code doesn't specify one.
The platform default encoding is a two-sided sword, however:
- it makes writing code easier, because you don't have to specify an encoding for each operation but
- it might not match the data you have: If the platform-default encoding is ISO-8859-1 and the file you read is actually UTF-8, then you will get a scrambled output!
For reading, we should also mention the BufferedReader
which can be wrapped around any other Reader
and adds the ability to handle whole lines at once.
Scanner
is a special class that's meant to parse text input into tokens. It's most useful for structured text but often used on System.in
to provide a very simple way to read data from stdin (i.e. from what the user inputs on the keyboard).
Bridgin the gap
Now, confusingly enough there are classes that make the bridge between those worlds, which generally have both parts in their names:
- an
InputStreamReader
consumes a InputStream
and is itself a Reader
.
- an
OutputStreamWriter
is a Writer
and writes to an OutputStream
.
And then there are "shortcut classes" that basically combine two other classes that are often combined.
- a
FileReader
is basically a combination of a FileInputStream
with an InputStreamReader
- a
FileWriter
is basically a combination of a FileOutputStream
with an OutputStreamWriter
Note that FileReader
and FileWriter
have a major drawback compared to their more complicated "hand-built" alternative: they always use the platform default encoding, which might not be what you're trying to do!
What about serialization?
ObjectOutputStream
and ObjectInputStream
are special streams used for serialization.
As the name of the classes implies serializing involves only binary data (even if serializing String
objects), so you'll want to use *Stream
classes exclusively. As long as you avoid any Reader
/Writer
classes, you should be fine.
Further resources
- the Basic I/O trail.
- Joel's old-ish article on Unicode (good introduction, slightly light on technical detail)
- On the evils of platform default encoding (also this)