What is the fastest way to bulk upload files Azure Blob Storage? I've tried two methods, sync
and async
uploads, async
is obviously the fastest but I'm wondering if there is a better method? Is there built in support for batch uploads? I can't find anything in the documentation but might of missed it.
This is the test I ran:
static void Main(string[] args)
{
int totalFiles = 10; //10, 50, 100
byte[] randomData = new byte[2097152]; //2mb
for (int i = 0; i < randomData.Length; i++)
{
randomData[i] = 255;
}
CloudStorageAccount cloudStorageAccount = CloudStorageAccount.Parse(ConfigurationManager.AppSettings["StorageConnectionString"]);
var blobClient = cloudStorageAccount.CreateCloudBlobClient();
var container = blobClient.GetContainerReference("something");
container.CreateIfNotExists();
TimeSpan tsSync = Test1(totalFiles, randomData, container);
TimeSpan tsAsync = Test2(totalFiles, randomData, container);
Console.WriteLine($"Sync: {tsSync}");
Console.WriteLine($"Async: {tsAsync}");
Console.ReadLine();
}
public static TimeSpan Test2(int total, byte[] data, CloudBlobContainer container)
{
Stopwatch sw = new Stopwatch();
sw.Start();
Task[] tasks = new Task[total];
for (int i = 0; i < total; i++)
{
CloudBlockBlob blob = container.GetBlockBlobReference(Guid.NewGuid().ToString());
tasks[i] = blob.UploadFromByteArrayAsync(data, 0, data.Length);
}
Task.WaitAll(tasks);
sw.Stop();
return sw.Elapsed;
}
public static TimeSpan Test1(int total, byte[] data, CloudBlobContainer container)
{
Stopwatch sw = new Stopwatch();
sw.Start();
for (int i = 0; i < total; i++)
{
CloudBlockBlob blob = container.GetBlockBlobReference(Guid.NewGuid().ToString());
blob.UploadFromByteArray(data, 0, data.Length);
}
sw.Stop();
return sw.Elapsed;
}
The output from this is:
10 Files
Sync: 00:00:08.7251781
Async: 00:00:04.7553491
DMLib: 00:00:05.1961654
Sync: 00:00:08.1169861
Async: 00:00:05.2384105
DMLib: 00:00:05.4955403
Sync: 00:00:07.6122464
Async: 00:00:05.0495365
DMLib: 00:00:06.4714047
50 Files
Sync: 00:00:39.1595797
Async: 00:00:22.5757347
DMLib: 00:00:25.2897623
Sync: 00:00:40.4932800
Async: 00:00:22.3296490
DMLib: 00:00:26.0631829
Sync: 00:00:39.2879245
Async: 00:00:24.0746697
DMLib: 00:00:26.9243116
I hope this is a valid question for SO.
Thanks
EDIT:
I have updated the results with "DMLib" tests in response to the answers given so far. DMLib is a test with no config changes (see above) no performance gains
I ran some more tests with ServicePointManager.DefaultConnectionLimit = Environment.ProcessorCount * 8;
as recommend by the documention, this increased the upload speed by quite a bit, but it also increased the upload speed of my async method. So far the DMlib has not given me any performance increases that are worthy. I've added the second set of test results with this config change below.
I also set ServicePointManager.Expect100Continue = false;
however the this made no difference to speed.
Test results with ServicePointManager.DefaultConnectionLimit = Environment.ProcessorCount * 8;
10 Files
Sync: 00:00:07.6199307
Async: 00:00:02.9615565
DMLib: 00:00:02.6629716
Sync: 00:00:08.7721797
Async: 00:00:02.8246599
DMLib: 00:00:02.7281091
Sync: 00:00:07.8437682
Async: 00:00:03.0171246
DMLib: 00:00:03.0190045
50 Files
Sync: 00:00:40.2395863
Async: 00:00:10.3157544
DMLib: 00:00:10.5107740
Sync: 00:00:40.2473358
Async: 00:00:10.8190161
DMLib: 00:00:10.2585441
Sync: 00:00:41.2646137
Async: 00:00:13.7188085
DMLib: 00:00:10.8686173
Am I using the library incorrectly as it does not seem to provide any better performance than my own method.