Optimal buffer size for write(2)

2020-01-31 12:07发布

问题:

Let's say I want to write a 1 GB of data to the file on, say ext3 Linux filesystem using write(2) syscall and this happens in a very busy environment (many similar I/Os concurently). What is the optimal buffer size in the interval, say, [4 kB, 4 MB] to do that when

  1. not using O_DIRECT open flag, or
  2. using O_DIRECT?

Please, no "check it yourself" answers -- I'd like to get some answer from "filesystems" guys.

回答1:

As discussed in comments, I believe the exact size don't matter that much, assuming it is :

  • a small multiple of the file system size (see comment by Joachim Pileborg suggesting stat(".") etc.)
  • a power of two (because computers and kernels like them)
  • not too big (e.g. fitting in some cache inside your processor, e.g. L2 cache)
  • aligned in memory (e.g. to a page size using posix_memalign).

So a power of two between 16kbytes and a few megabytes should probably fit. Most of the time is spent on reading the disk. Filesystem and disk benchmarks are quite flat in that range.

4Kbytes seems to often be the page size and the disk chunk size.

Of course, you can tune things, even tune, when making the file system with mke2fs, the file system block size.

And I'll bet that the optimal is really dependent upon your hardware (SSD, hard disks?) and your system (and its load).



回答2:

The answer is in my experience much more dependent on the underlying devices and hardware rather than the filesystem itself -- that is buffer caches on the device, and the capabilities of the device to write in small blocks etc -- however you should never write in smaller sizes than your file system block size (stat(.) -- likely to be about 4kb) -- similarly you should not really go beyond the L2/L3 cache size of the CPU which in many cases can be as low as 512kb.

Given that SSD devices and similar like the 64kb as the unit of operations, then I would suggest that a buffer size of 64kb-128kb being the most optimal -- which also correspond with my empirical experience as having the highest throughput.



回答3:

Including stdio.h should define BUFSIZ as the optimal size for the system. This is by no means guaranteed, but it is the right value to use if you do not have the ability to do extensive benchmarks, and it is a good starting point for such benchmarks.