Reading string as a stream without copying

2019-09-20 08:54发布

问题:

I have some data in a string. I have a function that takes a stream as input. I want to provide my data to my function without having to copy the complete string into a stream. Essentially I'm looking for a stream class that can wrap a string and read from it.

The only suggestions I've seen online suggest the StringReader which is NOT a stream, or creating a memory stream and writing to it, which means copying the data. I could write my own stream object but the tricky part is handling encoding because a stream deals in bytes. Is there a way to do this without writing new stream classes?

I'm implementing pipeline components in BizTalk. BizTalk deals with everything entirely with streams, so you always pass things to BizTalk in a stream. BizTalk will always read from that stream in small chunks, so it doesn't make sense to copy the entire string to a stream (especially if the string is large), if I can read from the stream how BizTalk wants it.

回答1:

You can prevent having to maintain a copy of the whole thing, but you would be forced to use an encoding that results in the same number of bytes for each character. That way you could provide chunks of data via Encoding.GetBytes(str, strIndex, byteCount, byte[], byteIndex) as they're being requested straight into the read buffer.

There will always be one copy action per Stream.Read() operation, because it lets the caller provide the destination buffer.



回答2:

Here is a proper StringReaderStream with following drawbacks:

  • The buffer for Read has to be at least maxBytesPerChar long. It's possible to implement Read for small buffers by keeping internal one char buff = new byte[maxBytesPerChar]. But's not necessary for most usages.
  • No Seek, it's possible to do seek, but would be very tricky generaly. (Some seek cases, like seek to beginning, seek to end, are simple to implement. )
/// <summary>
/// Convert string to byte stream.
/// <para>
/// Slower than <see cref="Encoding.GetBytes()"/>, but saves memory for a large string.
/// </para>
/// </summary>
public class StringReaderStream : Stream
{
    private string input;
    private readonly Encoding encoding;
    private int maxBytesPerChar;
    private int inputLength;
    private int inputPosition;
    private readonly long length;
    private long position;

    public StringReaderStream(string input)
        : this(input, Encoding.UTF8)
    { }

    public StringReaderStream(string input, Encoding encoding)
    {
        this.encoding = encoding ?? throw new ArgumentNullException(nameof(encoding));
        this.input = input;
        inputLength = input == null ? 0 : input.Length;
        if (!string.IsNullOrEmpty(input))
            length = encoding.GetByteCount(input);
            maxBytesPerChar = encoding == Encoding.ASCII ? 1 : encoding.GetMaxByteCount(1);
    }

    public override bool CanRead => true;

    public override bool CanSeek => false;

    public override bool CanWrite => false;

    public override long Length => length;

    public override long Position
    {
        get => position;
        set => throw new NotImplementedException();
    }

    public override void Flush()
    {
    }

    public override int Read(byte[] buffer, int offset, int count)
    {
        if (inputPosition >= inputLength)
            return 0;
        if (count < maxBytesPerChar)
            throw new ArgumentException("count has to be greater or equal to max encoding byte count per char");
        int charCount = Math.Min(inputLength - inputPosition, count / maxBytesPerChar);
        int byteCount = encoding.GetBytes(input, inputPosition, charCount, buffer, offset);
        inputPosition += charCount;
        position += byteCount;
        return byteCount;
    }

    public override long Seek(long offset, SeekOrigin origin)
    {
        throw new NotImplementedException();
    }

    public override void SetLength(long value)
    {
        throw new NotImplementedException();
    }

    public override void Write(byte[] buffer, int offset, int count)
    {
        throw new NotImplementedException();
    }
}


回答3:

Stream can only copy data. In addition, it deals in bytes, not chars so you'll have to copy data via the decoding process. But, If you want to view a string as a stream of ASCII bytes, you could create a class that implements Stream to do it. For example:

public class ReadOnlyStreamStringWrapper : Stream
{
    private readonly string theString;

    public ReadOnlyStreamStringWrapper(string theString)
    {
        this.theString = theString;
    }

    public override void Flush()
    {
        throw new NotSupportedException();
    }

    public override long Seek(long offset, SeekOrigin origin)
    {
        switch (origin)
        {
            case SeekOrigin.Begin:
                if(offset < 0 || offset >= theString.Length)
                    throw new InvalidOperationException();

                Position = offset;
                break;
            case SeekOrigin.Current:
                if ((Position + offset) < 0)
                    throw new InvalidOperationException();
                if ((Position + offset) >= theString.Length)
                    throw new InvalidOperationException();

                Position += offset;
                break;
            case SeekOrigin.End:
                if ((theString.Length + offset) < 0)
                    throw new InvalidOperationException();
                if ((theString.Length + offset) >= theString.Length)
                    throw new InvalidOperationException();
                Position = theString.Length + offset;
                break;
        }

        return Position;
    }

    public override void SetLength(long value)
    {
        throw new NotSupportedException();
    }

    public override int Read(byte[] buffer, int offset, int count)
    {
        return Encoding.ASCII.GetBytes(theString, (int)Position, count, buffer, offset);
    }

    public override void Write(byte[] buffer, int offset, int count)
    {
        throw new NotSupportedException();
    }

    public override bool CanRead
    {
        get { return true; }
    }

    public override bool CanSeek
    {
        get { return true; }
    }

    public override bool CanWrite
    {
        get { return false; }
    }

    public override long Length
    {
        get { return theString.Length; }
    }

    public override long Position { get; set; }
}

But, that's a lot of work to avoid "copying" data...