I know that the Azure Storage entities (blobs, tables and queues) have a built-in resiliency, meaning that they are replicated to 3 different servers in the same datacenter. On top of that they may also be replicated to a different datacenter altogether that is physically located in a different geographical region. The chance of losing your data in this case is close to zero for all practical purposes.
However, what happens if a sloppy developer (or the one under the influence of alcohol :)) accidentally deletes the storage account through the Azure Portal or the Azure Storage Explorer tool? Worst yet, what if a hacker gets hold of your account and clears the storage? Is there a way to retrieve the gigabytes of deleted blobs or is that it? Somehow I think there has to be an elegant solution that Azure infrastructure provides here but I cannot find any documentation.
The only solution I can think of is to write my own process (worker role) that periodically backs up my entire storage to a different subscription/account, thus essentially doubling the cost of storage and transactions.
Any thoughts?
Regards,
Archil
Depending on where you want to backup your data, there are two options available:
Backing up data locally - If you wish to backup your data locally in your infrastructure, you could:
a. Write your own application using either Storage Client Library or consuming REST API or
b. Use 3rd party tools like Cerebrata Azure Management Cmdlets (Disclosure: I work for Cerebrata).
Backing up data in the cloud - Recently, Windows Azure Storage team announced Asynchronous Copy Blob functionality which will essentially allow you to copy data from one storage account to another storage account without downloading the data locally. The catch here is that your target storage account should be created after 7th June 2012. You can read more about this functionality on Windows Azure Blog: http://blogs.msdn.com/b/windowsazurestorage/archive/2012/06/12/introducing-asynchronous-cross-account-copy-blob.aspx.
Hope this helps.
The accepted answer is fine, but it took me a few hours to decipher through everything.
I've put together solution which I use now in production. I expose method Backup()
through Web Api
which is then called by an Azure WebJob
every day (at midnight).
Note that I've taken the original source code, and modified it:
- it wasn't up to date so I changed few method names
- added retry copy operation safeguard (fails after 4 tries for the same blob)
- added a little bit of logging - you should swap it out with your own.
- does the backup between two storage accounts (replicating containers & blobs)
- added purging - it gets rid of old containers that are not needed (keeps 16 days worth of data). you can always disable this, as space is cheap.
the source can be found from: https://github.com/ChrisEelmaa/StackOverflow/blob/master/AzureStorageAccountBackup.cs
and this is how I use it in the controller (note your controller should be only callable by the azure webjob - you can check credentials in the headers):
[Route("backup")]
[HttpPost]
public async Task<IHttpActionResult> Backup()
{
try
{
await _blobService.Backup();
return Ok();
}
catch (Exception e)
{
_loggerService.Error("Failed to backup blobs " + e);
return InternalServerError(new Exception("Failed to back up blobs!"));
}
}
note: I wanted to add this code as part of the post, but wasted 6 minutes trying to get that code into this post, but failed. the formatting didn't work at all, and it broke completely.
Without referring to 3rd party solutions, you can achieve that using built in features in Azure now using the below steps might help secure your blob.
Soft delete for Azure Storage Blobs
The better step is first to enable soft delete which is now in GA:
https://azure.microsoft.com/en-us/blog/soft-delete-for-azure-storage-blobs-ga
Read-access geo-redundant storage
The second approach is enable geo-replication for RA-RGA, so if the first data center is down you can always read from a secondary replica in another region, you can find more information in here:
https://docs.microsoft.com/en-us/azure/storage/common/storage-redundancy-grs
You can make a snapshot of a blog container and then download the snapshot for a point in time backup.
https://docs.microsoft.com/en-us/azure/storage/storage-blob-snapshots
A snapshot is a read-only version of a blob that's taken at a point in
time. Snapshots are useful for backing up blobs. After you create a
snapshot, you can read, copy, or delete it, but you cannot modify it.+
A snapshot of a blob is identical to its base blob, except that the
blob URI has a DateTime value appended to the blob URI to indicate the
time at which the snapshot was taken. For example, if a page blob URI
is http://storagesample.core.blob.windows.net/mydrives/myvhd, the
snapshot URI is similar to
http://storagesample.core.blob.windows.net/mydrives/myvhd?snapshot=2011-03-09T01:42:34.9360000Z.