Faster way to read file

2019-01-13 22:51发布

问题:

I am working on a program that has about 400 input files and about 40 output files. It's simple: It reads each input file and it generates a new file with but much bigger(based on a algorithm).

I'm using read() method from BufferedReader:

String encoding ="ISO-8859-1";
FileInputStream fis = new FileInputStream(nextFile);
BufferedReader reader = new BufferedReader(new InputStreamReader(fis, encoding));
char[] buffer = new char[8192] ;

To read the input files I'm using this:

private String getNextBlock() throws IOException{
    boolean isNewFile = false;

    int n = reader.read(buffer, 0, buffer.length);
    if(n == -1) {
        return null;
    } else {
        return new String(buffer,0,n);
    }
}

With each block I'm doing some checkings (like looking some string inside the block) and then I'm writing it into a file:

BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(
        new FileOutputStream("fileName"), encoding));

writer.write(textToWrite);

The problem is that it takes about 12 minutes. I'm trying to find something else much faster. Anyone have some idea about something better?

Thanks.

回答1:

You should be able to find a answer here:

http://nadeausoftware.com/articles/2008/02/java_tip_how_read_files_quickly

For the best Java read performance, there are four things to remember:

  • Minimize I/O operations by reading an array at a time, not a byte at a time. An 8Kbyte array is a good size.

  • Minimize method calls by getting data an array at a time, not a byte at a time. Use array indexing to get at bytes in the array.

  • Minimize thread synchronization locks if you don't need thread safety. Either make fewer method calls to a thread-safe class, or use a non-thread-safe class like FileChannel and MappedByteBuffer.

  • Minimize data copying between the JVM/OS, internal buffers, and application arrays. Use FileChannel with memory mapping, or a direct or wrapped array ByteBuffer.



回答2:

As you do not give too much details, I could sugest you to try to use use memory mapped files:

FileInputStream f = new FileInputStream(fileName);
FileChannel ch = f.getChannel( );
MappedByteBuffer mbb = ch.map( ch.MapMode.READ_ONLY, 0L, ch.size( ) );
while ( mbb.hasRemaining( ) )  {
      // Access the data using the mbb
}

It is possible to opitmize it if you'd give more detailt about which kind of data your files have.

EDIT

Where is the // access the date using the mbb, you cold decode your text:

String charsetName = "UTF-16"; // choose the apropriate charset.
CharBuffer cb =  Charsert.forName(charsetName).decode(mbb);
String text = cb.toString();


回答3:

Mapped byte buffers is the fastest way:

 FileInputStream f = new FileInputStream( name );
FileChannel ch = f.getChannel( );
MappedByteBuffer mb = ch.map( ch.MapMode.READ_ONLY,
    0L, ch.size( ) );
byte[] barray = new byte[SIZE];
long checkSum = 0L;
int nGet;
while( mb.hasRemaining( ) )
{
    nGet = Math.min( mb.remaining( ), SIZE );
    mb.get( barray, 0, nGet );
    for ( int i=0; i<nGet; i++ )
    checkSum += barray[i];
}