Proper Java classes for reading and writing files?

2019-03-13 14:16发布

问题:

Reading some sources about Java file I/O managing, I get to know that there are more than 1 alternative for input and output operations.

These are:

  • BufferedReader and BufferedWriter
  • FileReader and FileWriter
  • FileInputStream and FileOutputStream
  • InputStreamReader and OutputStreamWriter
  • Scanner class

What of these is best alternative for text files managing? What's best alternative for serialization? What does Java NIO say about it?

回答1:

Two kinds of data

Generally speaking there are two "worlds":

  • binary data
  • text data

When it's a file (or a socket, or a BLOB in a DB, or ...), then it's always binary data first.

Some of that binary data can be treated as text data (which involves something called an "encoding" or "character encoding").

Binary Data

Whenever you want to handle the binary data then you need to use the InputStream/OutputStream classes (generally, everything that contains Stream in its name).

That's why there's a FileInputStream and a FileOutputStream: those read from and write to files and they handle binary data.

Text Data

Whenever you want to handle text data, then you need to use the Reader/Writer classes.

Whenever you need to convert binary data to text (or vice versa), then you need some kind of encoding (common ones are UTF-8, UTF-16, ISO-8859-1 (and related ones) and the good old US-ASCII). "Luckily" the Java platform also has something called the "default platform encoding" which it will use whenever it needs one but the code doesn't specify one.

The platform default encoding is a two-sided sword, however:

  • it makes writing code easier, because you don't have to specify an encoding for each operation but
  • it might not match the data you have: If the platform-default encoding is ISO-8859-1 and the file you read is actually UTF-8, then you will get a scrambled output!

For reading, we should also mention the BufferedReader which can be wrapped around any other Reader and adds the ability to handle whole lines at once.

Scanner is a special class that's meant to parse text input into tokens. It's most useful for structured text but often used on System.in to provide a very simple way to read data from stdin (i.e. from what the user inputs on the keyboard).

Bridgin the gap

Now, confusingly enough there are classes that make the bridge between those worlds, which generally have both parts in their names:

  • an InputStreamReader consumes a InputStream and is itself a Reader.
  • an OutputStreamWriter is a Writer and writes to an OutputStream.

And then there are "shortcut classes" that basically combine two other classes that are often combined.

  • a FileReader is basically a combination of a FileInputStream with an InputStreamReader
  • a FileWriter is basically a combination of a FileOutputStream with an OutputStreamWriter

Note that FileReader and FileWriter have a major drawback compared to their more complicated "hand-built" alternative: they always use the platform default encoding, which might not be what you're trying to do!

What about serialization?

ObjectOutputStream and ObjectInputStream are special streams used for serialization.

As the name of the classes implies serializing involves only binary data (even if serializing String objects), so you'll want to use *Stream classes exclusively. As long as you avoid any Reader/Writer classes, you should be fine.

Further resources

  • the Basic I/O trail.
  • Joel's old-ish article on Unicode (good introduction, slightly light on technical detail)
  • On the evils of platform default encoding (also this)