Why write to Stream in chunks?

2019-06-22 15:09发布

问题:

I am wondering why so many examples read byte arrays into streams in chucks and not all at once... I know this is a soft question, but I am interested.

I understand a bit about hardware and filling buffers can be very size dependent and you wouldn't want to write to the buffer again until it has been flushed to wherever it needs to go etc... but with the .Net platform (and other modern languages) I see examples of both. So when use which and when, or is the second an absolute no no?

Here is the thing (code) I mean:

var buffer = new byte[4096];

while (true)
{
    var read = this.InputStream.Read(buffer, 0, buffer.Length);

    if (read == 0)
        break;

    OutputStream.Write(buffer, 0, read);
}

rather than:

var buffer = new byte[InputStream.Length];

var read = this.InputStream.Read(buffer, 0, buffer.Length);

OutputStream.Write(buffer, 0, read);

I believe both are legal? So why go through all the fuss of the while loop (in whatever for you decide to structure it)?

I am playing devils advocate here as I want to learn as much as I can :)

回答1:

In the first case, all you need is 4kB of memory. In the second case, you need as much memory as the input stream data takes. If the input stream is 4GB, you need 4GB.

Do you think it would be good if a file copy operation required 4GB of RAM? What if you were to prepare a disk image that's 20GB?

There is also this thing with pipes. You don't often use them on Windows, but a similar case is often seen on other operating systems. The second case waits for all data to be read, and only then writes them to the output. However, sometimes it is advisable to write data as soon as possible—the first case will start writing to the output stream as soon as the first 4kB of input is read. Think of serving web pages: it is advisable for a web server to send data as soon as possible, so that client's web browser will start rendering headers and first part of the content, not waiting for the whole body.

However, if you know that the input stream won't be bigger than 4kB, then both cases are equivalent.



回答2:

Sometimes, InputStream.Length is not valid for some source, e.g from the net transport, or the buffer maybe huge, e.g read from a huge file. IMO.



回答3:

It protects you from the situation where your input stream is several gigabytes long.



回答4:

You have no idea how much data Read might return. This could create major performance problems if you're reading a very large file.

If you have control over the input, and are sure the size is reasonable, then you can certainly read the whole array in at once. But be especially careful if the user can supply an arbitrary input.