Java IO : Writing into a text file line by line [c

2019-04-17 16:02发布

问题:

I have a requirement where i need to write a text file line by line. The number of lines may be up to 80K . I am opening the file output stream and inside a for-loop , iterating a list and forming a line and writing the line into the file.

This means 80K write operations are made on the file .

Opening and writing the file very frequently hinders performance. Can anyone suggest a best way yo address this requirement in Java IO?

Thanks.

回答1:

You haven't posted any code, but as long as your writes are buffered you should hardly notice the performance. Use BufferedWriter.write() followed by BufferedWriter.newLine(), and avoid flushing as much as you can. Don't 'form a line', just write whatever you have to write as soon as you have it. Much if not all of the overhead you are observing may actually be string concatenation rather than I/O.

The alternatives mentioned in other answers either amount to this implemented in more baroque ways, or involve NIO which isn't going to be any faster.



回答2:

Use a BufferedOutputStream. With it all writes are written in a buffer at first and not directly to disk. Writing to disk appears only if the buffer is full and while closing or flushing the stream. The default buffer size is 8192 bytes but you can specify your own buffer size.

Here is an example using the default buffer size:

PrintWriter out = null;
try {
  out = new PrintWriter(new OutputStreamWriter(
      new BufferedOutputStream(new FileOutputStream("out.txt")), "UTF-8"));
  for(int i = 0; i < 80000; i++) {
    out.println(String.format("Line %d", i));
  }      
} catch (UnsupportedEncodingException e) {
  e.printStackTrace();
} catch (FileNotFoundException e) {
  e.printStackTrace();
} finally {
  if(out != null) {
    out.flush();
    out.close();
  }
}


回答3:

Below are the heuristics that I use to aid my decisions when designing for fast file IO and a set of benchmarks that I use to test different alternatives.

Heuristics:

  1. preallocate the file, asking the OS to resize the file is expensive,
  2. stream the data as much as possible, avoiding seeking as they do not perform on spinning discs,
  3. batch the writes (while taking care to not create excessive GC problems),
  4. when designing for ssd's, avoid updating data in place.. that is the slowest op on a ssd. A complete guide to there SSD quirks can be read here
  5. where possible avoid copying data between buffers (this is where java nio can help) and
  6. if possible use memory mapped files. Memory mapped files are under used in Java, however handing the disk writes over to the OS to perform asynchronously is typically an order of magnitude faster than the alternatives; ie BufferedWriter and RandomAccessFile.

I wrote the following file benchmarks awhile ago. Give them a run: https://gist.github.com/kirkch/3402882

When I run the benchmarks, against a standard spinning disk I got these results:

Stream Write: 438
Mapped Write: 28
Stream Read: 421
Mapped Read: 12
Stream Read/Write: 1866
Mapped Read/Write: 19

All numbers are in ms, so smaller is better. Notice that memory mapped files consistently out perform every other approach.

The other surprise that I have found when writing these types of systems is that in later versions of Java, using BufferedWriter can be slower than just using FileWriter directly or RandomAccessFile. It turns out that buffering is done lower down now, I think that it happened when Sun rewrote java.io to use channels and byte buffers under the covers. Yet the advice of adding ones own buffering remains common practice. As aways measure first on your target environment, feel free to adjust the benchmark code above to experiment further.

While looking for links to back up some of the facts above, I came across Martin Thompson's post on this topic. It is well worth a read.