I have a 2GB file in blob storage and am building a console application that will download this file into a desktop. Requirement is to split into 100MB chunks and append a number into the filename. I do not need to re-combine those files again. What I need is only the chunks of files.
I currently have this code from Azure download blob part
But I cannot figure out how to stop downloading when the file size is already 100MB and create a new one.
Any help will be appreciated.
Update: Here is my code
CloudStorageAccount account = CloudStorageAccount.Parse(connectionString);
var blobClient = account.CreateCloudBlobClient();
var container = blobClient.GetContainerReference(containerName);
var file = uri;
var blob = container.GetBlockBlobReference(file);
//First fetch the size of the blob. We use this to create an empty file with size = blob's size
blob.FetchAttributes();
var blobSize = blob.Properties.Length;
long blockSize = (1 * 1024 * 1024);//1 MB chunk;
blockSize = Math.Min(blobSize, blockSize);
//Create an empty file of blob size
using (FileStream fs = new FileStream(file, FileMode.Create))//Create empty file.
{
fs.SetLength(blobSize);//Set its size
}
var blobRequestOptions = new BlobRequestOptions
{
RetryPolicy = new ExponentialRetry(TimeSpan.FromSeconds(5), 3),
MaximumExecutionTime = TimeSpan.FromMinutes(60),
ServerTimeout = TimeSpan.FromMinutes(60)
};
long startPosition = 0;
long currentPointer = 0;
long bytesRemaining = blobSize;
do
{
var bytesToFetch = Math.Min(blockSize, bytesRemaining);
using (MemoryStream ms = new MemoryStream())
{
//Download range (by default 1 MB)
blob.DownloadRangeToStream(ms, currentPointer, bytesToFetch, null, blobRequestOptions);
ms.Position = 0;
var contents = ms.ToArray();
using (var fs = new FileStream(file, FileMode.Open))//Open that file
{
fs.Position = currentPointer;//Move the cursor to the end of file.
fs.Write(contents, 0, contents.Length);//Write the contents to the end of file.
}
startPosition += blockSize;
currentPointer += contents.Length;//Update pointer
bytesRemaining -= contents.Length;//Update bytes to fetch
Console.WriteLine(fileName + dateTimeStamp + ".csv " + (startPosition / 1024 / 1024) + "/" + (blob.Properties.Length / 1024 / 1024) + " MB downloaded...");
}
}
while (bytesRemaining > 0);
Per my understanding, you could break your blob file into your expected pieces (100MB), then leverage CloudBlockBlob.DownloadRangeToStream to download each of your chunks of files. Here is my code snippet, you could refer to it:
ParallelDownloadBlob
Program Main
RESULT
UPDATE:
Based on your requirement, I recommend that you could download your blob into a single file, then leverage LumenWorks.Framework.IO to read your large file records line by line, then check the byte size you have read and save into a new csv file with the size up to 100MB. Here is a code snippet, you could refer to it:
Additionally, you could refer to A Fast CSV Reader and CsvHelper for more details.
UPDATE2
Code sample for breaking large CSV file into small CSV file with the fixed bytes, I used CsvHelper 2.16.3 for the following code snippet, you could refer to it:
RESULT