How can I split (copy) a Stream in .NET?

2019-01-12 00:19发布

问题:

Does anyone know where I can find a Stream splitter implementation?

I'm looking to take a Stream, and obtain two separate streams that can be independently read and closed without impacting each other. These streams should each return the same binary data that the original stream would. No need to implement Position or Seek and such... Forward only.

I'd prefer if it didn't just copy the whole stream into memory and serve it up multiple times, which would be fairly simple enough to implement myself.

Is there anything out there that could do this?

回答1:

Not out of the box.

You'll need to buffer the data from the original stream in a FIFO manner, discarding only data which has been read by all "reader" streams.

I'd use:

  • A "management" object holding some sort of queue of byte[] holding the chunks to be buffered and reading additional data from the source stream if required
  • Some "reader" instances which known where and on what buffer they are reading, and which request the next chunk from the "management" and notify it when they don't use a chunk anymore, so that it may be removed from the queue


回答2:

This could be tricky without risking keeping everything buffered in memory (if the streams are at BOF and EOF respectively).

I wonder whether it isn't easier to write the stream to disk, copy it, and have two streams reading from disk, with self-deletion built into the Close() (i.e. write your own Stream wrapper around FileStream).



回答3:

You can't really do this without duplicating at least part of the sourse stream - mostly due to the fact that if doesn't sound like you can control the rate at which they are consumed (multiple threads?). You could do something clever regarding one reading ahread of the other (and thereby making the copy at that point only) but the complexiy of this sounds like it's not worth the trouble.



回答4:

The below seems to be valid called EchoStream http://www.codeproject.com/Articles/3922/EchoStream-An-Echo-Tee-Stream-for-NET Its a very old implementation (2003) but should provide some context

found via Redirect writes to a file to a stream C#



回答5:

I do not think you will be able to find a generic implementation to do just that. A Stream is rather abstract, you don't know where the bytes are coming from. For instance you don't know if it will support seeking; and you don't know the relative cost of operations. (The Stream might be an abstraction of reading data from a remote server, or even off a backup tape !).

If you are able to have a MemoryStream and store the contents once, you can create two separate streams using the same buffer; and they will behave as independent Streams but only use the memory once.

Otherwise, I think you are best off by creating a wrapper class that stores the bytes read from one stream, until they are also read by the second stream. That would give you the desired forward-only behaviour - but in worst case, you might risk storing all of the bytes in memory, if the second Stream is not read until the first Stream has completed reading all content.



回答6:

I have made a SplitStream available on github and NuGet.

It goes like this.

using (var inputSplitStream = new ReadableSplitStream(inputSourceStream))

using (var inputFileStream = inputSplitStream.GetForwardReadOnlyStream())
using (var outputFileStream = File.OpenWrite("MyFileOnAnyFilestore.bin"))

using (var inputSha1Stream = inputSplitStream.GetForwardReadOnlyStream())
using (var outputSha1Stream = SHA1.Create())
{
    inputSplitStream.StartReadAhead();

    Parallel.Invoke(
        () => {
            var bytes = outputSha1Stream.ComputeHash(inputSha1Stream);
            var checksumSha1 = string.Join("", bytes.Select(x => x.ToString("x")));
        },
        () => {
            inputFileStream.CopyTo(outputFileStream);
        },
    );
}

I have not tested it on very large streams, but give it a try.

github: https://github.com/microknights/SplitStream



回答7:

With the introduction of async / await, so long as all but one of your reading tasks are async, you should be able to process the same data twice using only a single OS thread.

What I think you want, is a linked list of the data blocks you have seen so far. Then you can have multiple custom Stream instances that hold a pointer into this list. As blocks fall off the end of the list, they will be garbage collected. Reusing the memory immediately would require some other kind of circular list and reference counting. Doable, but more complicated.

When your custom Stream can answer a ReadAsync call from the cache, copy the data, advance the pointer down the list and return.

When your Stream has caught up to the end of the cache list, you want to issue a single ReadAsync to the underlying stream, without awaiting it, and cache the returned Task with the data block. So if any other Stream reader also catches up and tries to read more before this read completes, you can return the same Task object.

This way, both readers will hook their await continuation to the result of the same ReadAsync call. When the single read returns, both reading tasks will sequentially execute the next step of their process.



标签: c# .net io stream