I'm reading binary files and here is a sample:
public static byte[] ReadFully(Stream input)
{
byte[] buffer = new byte[16*1024];
int read;
while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
{
......
}
}
Obviously the buffer size (16*1024) has a great role in performance. I've read that it depends on the I/O technology (SATA, SSD, SCSI, etc.) and also the fragment size of the partition which file exists on it (we can define during the formatting the partition).
But here is the question:
Is there any formula or best practice to define the buffer size? Right now, I'm defining based on trial-and-error.
Edit:
I've tested the application on my server with different buffer sizes, and I get the best performance with 4095*256*16 (16 MB)!!! 4096 is 4 seconds slower.
Here are some older posts which are very helpful but I can't still get the reason:
Faster (unsafe) BinaryReader in .NET
Optimum file buffer read size?
File I/O with streams - best memory buffer size
How do you determine the ideal buffer size when using FileInputStream?
"Sequential File Programming Patterns and Performance with .NET" is a great article in I/O performance improvement.
In page 8 of this PDF file, it shows that the bandwidth for buffer size bigger than eight bytes, is constant. Consider that the article has been written in 2004 and the hard disk drive is "Maxtor 250 GB 7200 RPM SATA disk" and the result should be different by latest I/O technologies.
If you are looking for the best performance take a look at pinvoke.net or the page 9 of the PDF file, the un-buffered file performance measurements shows better results:
In un-buffered I/O, the disk data moves directly between the
application’s address space and the device without any intermediate
copying.
Summary
- For single disks, use the defaults of the .NET framework – they deliver excellent performance for sequential file access.
- Pre-allocate large sequential files (using the SetLength() method) when the file is created. This typically improves speed by about 13% when compared to a fragmented file.
- At least for now, disk arrays require un-buffered I/O to achieve the highest performance - buffered I/O can be eight times slower than un-buffered I/O. We expect this problem will be addressed in later releases of the .NET framework.
- If you do your own buffering, use large request sizes (64 KB is a good place to start). Using the .NET framework, a single processor can read and write a disk array at over 800 Mbytes/s using un-buffered I/O.
There is no best or worst buffer size, but you have to look at the some aspects.
As you are using C#, so you run on Windows, Windows uses NTFS and its page size is 4 MB, so it is advisable to use multiples of 4096. So your buffer size is 16*1024 = 4*4096
, and it is a good choice, but to say if it is better or worse than 16*4096
we cannot say.
Everything depends on the situation and the requirements for program. Remember here you cannot choose the best option, but only some better. I recommend to use 4096
, but also you could use your own 4*4096
or even 16*4096
, but remember, that this buffer will be allocated on the heap, so its allocation takes some time, so you don't want to allocate a big buffer, for example 128*4096
.