Performance of FileStream's Write vs WriteByte

2019-07-21 00:48发布

I need to write bytes of an IEnumerable<byte> to a file.
I can convert it to an array and use Write(byte[]) method:

using (var stream = File.Create(path))
    stream.Write(bytes.ToArray());

But since IEnumerable doesn't provide the collection's item count, using ToArray is not recommended unless it's absolutely necessary.

So I can just iterate the IEnumerable and use WriteByte(byte) in each iteration:

using (var stream = File.Create(path))
    foreach (var b in bytes)
        stream.WriteByte(b);

I wonder which one will be faster when writing lots of data.

I guess using Write(byte[]) sets the buffer according to the array size so it would be faster when it comes to arrays.

My question is when I just have an IEnumerable<byte> that has MBs of data, which approach is better? Converting it to an array and call Write(byte[]) or iterating it and call WriteByte(byte) for each?

2条回答
疯言疯语
2楼-- · 2019-07-21 01:02

Enumerating over a large stream of bytes is a process that adds tons of overhead to something that is normally cheap: Copying bytes from one buffer to the next.

Normally, LINQ-style overhead does not matter much but when it comes to processing 100 million bytes per second on a normal hard drive you will notice severe overheads. This is not premature optimization. We can foresee that this will be a performance hotspot so we should eagerly optimize.

So when copying bytes around you probably should not rely on abstractions like IEnumerable and IList at all. Pass around arrays or ArraySegement<byte>'s which also contain Offset and Count. This frees you from slicing arrays too often.

One thing that is a death-sin with high-throughput IO, too, is calling a method per byte. Like reading bytewise and writing bytewise. This kills performance because these methods have to be called hundreds of millions of times per second. I have experienced that myself.

Always process entire buffers of at least 4096 bytes at a time. Depending on what media you are doing IO with you can use much larger buffers (64k, 256k or even megabytes).

查看更多
forever°为你锁心
3楼-- · 2019-07-21 01:18

You should profile which version is faster. The FileStream class has an internal buffer that decouples the Read() and Write() methods a bit from the actual file system accesses.

If you don't specify a buffer size in the FileStream constructor, it uses something like 4096 Bytes of buffer by default. That buffer will combine many of your WriteByte() calls into one write to the underlying file. The only question is whether the overhead of the WriteByte() calls will exceed the overhead of the Enumerable.ToArray() call. The latter definitely will use more memory, but you always have to deal with this sort of trade-off.

FYI: The current .NET 4 implementation of Enumerable.ToArray() involves growing an array by duplicating its size whenever necessary. Each time it grows, all values are copied over. Also, when all items are stored in the array, its content is copied again to an array of the final size. For IEnumerable<T> instances that actually implement ICollection<T>, the code takes advantage of that fact to start with the correct array size and let the collection to the copying instead.

查看更多
登录 后发表回答