Asynchronous download of an Azure blob to string w

2019-03-19 04:25发布

问题:

I'm trying to implement a fully asynchronous blob download with .NET 4.5 async & await.

Let's assume the entire blob can fit in memory at once, and we want to hold it in a string.

Code:

public async Task<string> DownloadTextAsync(ICloudBlob blob)
{
    using (Stream memoryStream = new MemoryStream())
    {
        IAsyncResult asyncResult = blob.BeginDownloadToStream(memoryStream, null, null);
        await Task.Factory.FromAsync(asyncResult, (r) => { blob.EndDownloadToStream(r); });
        memoryStream.Position = 0;

        using (StreamReader streamReader = new StreamReader(memoryStream))
        {
            // is this good enough?
            return streamReader.ReadToEnd();

            // or do we need this?
            return await streamReader.ReadToEndAsync();
        }
    }
}

Usage:

CloudStorageAccount storageAccount = CloudStorageAccount.Parse(CloudConfigurationManager.GetSetting("StorageAccountConnectionString"));
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = blobClient.GetContainerReference("container1");
CloudBlockBlob blockBlob = container.GetBlockBlobReference("blob1.txt");

string text = await DownloadTextAsync(blockBlob);

Is this code correct and this is indeed fully asynchronous? Would you implement this differently?

I'd appreciate some extra clarifications:

  1. GetContainerReference and GetBlockBlobReference don't need to be async since they don't contact the server yet, right?

  2. Does streamReader.ReadToEnd need to be async or not?

  3. I'm a little confused about what BeginDownloadToStream does.. by the time EndDownloadToStream is called, does my memory stream have all the data inside? or is the stream only open pre read?

Update: (as of Storage 2.1.0.0 RC)

Async now supported natively.

CloudStorageAccount storageAccount = CloudStorageAccount.Parse(CloudConfigurationManager.GetSetting("StorageAccountConnectionString"));
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = blobClient.GetContainerReference("container1");
CloudBlockBlob blockBlob = container.GetBlockBlobReference("blob1.txt");

string text = await blockBlob.DownloadTextAsync();

回答1:

Is this code correct and this is indeed fully asynchronous?

Yes.

Would you implement this differently?

Yes. In particular, the TaskFactory.FromAsync wrappers are much more efficient if you pass in a Begin/End method pair instead of passing in an existing IAsyncResult. Like this:

await Task.Factory.FromAsync(blob.BeginDownloadToStream,
    blob.EndDownloadToStream, memoryStream, null);

I also prefer to wrap these up into separate extension methods so I can call it like this:

await blog.DownloadToStreamAsync(memoryStream);

Note that the next version of the client libraries (2.1, currently in RC) will have async-ready methods, i.e., DownloadToStreamAsync.

GetContainerReference and GetBlockBlobReference don't need to be async since they don't contact the server yet, right?

Correct.

Does streamReader.ReadToEnd need to be async or not?

It does not (and should not). Stream is a bit of an unusual case with async programming. Usually, if there's an async method then you should use it in your async code, but that guideline doesn't hold for Stream types. The reason is that the base Stream class doesn't know whether its implementation is synchronous or asynchronous, so it assumes that it's synchronous and by default will fake its asynchronous operations by just doing the synchronous work on a background thread. Truly asynchronous streams (e.g., NetworkStream) override this and provide true asynchronous operations. Synchronous streams (e.g., MemoryStream) keep this default behavior.

So you don't want to call ReadToEndAsync on a MemoryStream.

I'm a little confused about what BeginDownloadToStream does.. by the time EndDownloadToStream is called, does my memory stream have all the data inside?

Yes. The operation is DownloadToStream; that it, it downloads a blob into a stream. Since you are downloading a blob into a MemoryStream, the blob is entirely in memory by the time this operation completes.



回答2:

  1. Correct, they don't need to be async if they're not going to be long operations, which they shouldnt' be.

  2. Probably not, although I'm not familiar with this particular implementation. I would hope that since you're waiting for the stream to end before this point there should be no network work, and thus no expensive operations, to perform at this point. You should just be pulling data from a buffer, and it should be fast. This is easy enough to test, however. You can use something like Fiddler to see if there is network communication going on during that call, you can just time the method to see if it's taking long enough to appear that network IO is going on, or you could look through the docs of this specific stream implementation. Or you could just use the async method to be safe, which I would suggest, rather than risking being mistaken. I would be rather surprised to find that this needed to be async though.

  3. See #2.



回答3:

See : http://channel9.msdn.com/Events/TechEd/NorthAmerica/2013/WAD-B406#fbid=lCN9J5QiTDF for some helpful best practices including why you should avoid using Memory stream as the original code does :)

One note is that you have two main options for downloading Blobs, the Cloud[Block|Page]Blob.Download[Range]To* methods and the stream provided by OpenRead(). In the case of the Download apis the entire blob (or range if requested) is issued as a single GET call and the results are streamed / written to the appropriate location, in the case of a transient fault the subrange of bytes not yet received is requested according to the retry policy.

The OpenRead methods are meant for clients who wish to process data over a longer period of time and not keep a connection open. They work by specifying a given length that will be prebuffered at the client side, when the stream runs out of pre buffered data the next sub range is requested.

Lastly, as of 2.1 RTM a DownloadTextAsync method is provided that does all of this for you :) (with optional overloads to specify encoding, default is UTF8)