Getting blob count in an Azure Storage container

2019-06-15 01:19发布

问题:

What is the most efficient way to get the count on the number of blobs in an Azure Storage container?

Right now I can't think of any way other than the code below:

CloudBlobContainer container = GetContainer("mycontainer");
var count = container.ListBlobs().Count();

回答1:

The API doesn't contain a container count method or property, so you'd need to do something like what you posted. However, you'll need to deal with NextMarker if you exceed 5,000 items returned (or if you specify max # to return and the list exceeds that number). Then you'll make add'l calls based on NextMarker and add the counts.

EDIT: Per smarx: the SDK should take care of NextMarker for you. You'll need to deal with NextMarker if you're working at the API level, calling List Blobs through REST.

Alternatively, if you're controlling the blob insertions/deletions (through a wcf service, for example), you can use the blob container's metadata area to store a cached container count that you compute with each insert or delete. You'll just need to deal with write concurrency to the container.



回答2:

If you just want to know how many blobs are in a container without writing code you can use the Microsoft Azure Storage Explorer application.

  1. Open the desired BlobContainer
  2. Click the Folder Statistics icon
  3. Observe the count of blobs in the Activities window


回答3:

I tried counting blobs using ListBlobs() and for a container with about 400,000 items, it took me well over 5 minutes.

If you have complete control over the container (that is, you control when writes occur), you could cache the size information in the container metadata and update it every time an item gets removed or inserted. Here is a piece of code that would return the container blob count:

static int CountBlobs(string storageAccount, string containerId)
{
    CloudStorageAccount cloudStorageAccount = CloudStorageAccount.Parse(storageAccount);
    CloudBlobClient blobClient = cloudStorageAccount.CreateCloudBlobClient();
    CloudBlobContainer cloudBlobContainer = blobClient.GetContainerReference(containerId);

    cloudBlobContainer.FetchAttributes();

    string count = cloudBlobContainer.Metadata["ItemCount"];
    string countUpdateTime = cloudBlobContainer.Metadata["CountUpdateTime"];

    bool recountNeeded = false;

    if (String.IsNullOrEmpty(count) || String.IsNullOrEmpty(countUpdateTime))
    {
        recountNeeded = true;
    }
    else
    {
        DateTime dateTime = new DateTime(long.Parse(countUpdateTime));

        // Are we close to the last modified time?
        if (Math.Abs(dateTime.Subtract(cloudBlobContainer.Properties.LastModifiedUtc).TotalSeconds) > 5) {
            recountNeeded = true;
        }
    }

    int blobCount;
    if (recountNeeded)
    {
        blobCount = 0;
        BlobRequestOptions options = new BlobRequestOptions();
        options.BlobListingDetails = BlobListingDetails.Metadata;

        foreach (IListBlobItem item in cloudBlobContainer.ListBlobs(options))
        {
            blobCount++;
        }

        cloudBlobContainer.Metadata.Set("ItemCount", blobCount.ToString());
        cloudBlobContainer.Metadata.Set("CountUpdateTime", DateTime.Now.Ticks.ToString());
        cloudBlobContainer.SetMetadata();
    }
    else
    {
        blobCount = int.Parse(count);
    }

    return blobCount;
}

This, of course, assumes that you update ItemCount/CountUpdateTime every time the container is modified. CountUpdateTime is a heuristic safeguard (if the container did get modified without someone updating CountUpdateTime, this will force a re-count) but it's not reliable.



回答4:

Example using PHP API and getNextMarker.

Counts total number of blobs in an Azure container. It takes a long time: about 30 seconds for 100000 blobs.

(assumes we have a valid $connectionString and a $container_name)

$blobRestProxy = ServicesBuilder::getInstance()->createBlobService($connectionString);
$opts = new ListBlobsOptions();
$nblobs = 0;

while($cont) {

  $blob_list = $blobRestProxy->listBlobs($container_name, $opts);      

  $nblobs += count($blob_list->getBlobs());

  $nextMarker = $blob_list->getNextMarker();

  if (!$nextMarker || strlen($nextMarker) == 0) $cont = false;
  else $opts->setMarker($nextMarker);
}
echo $nblobs;


回答5:

If you are not using virtual directories, the following will work as previously answered.

CloudBlobContainer container = GetContainer("mycontainer");
var count = container.ListBlobs().Count();

However, the above code snippet may not have the desired count if you are using virtual directories.

For instance, if your blobs are stored similar to the following: /container/directory/filename.txt where the blob name = directory/filename.txt the container.ListBlobs().Count(); will only count how many "/directory" virtual directories you have. If you want to list blobs contained within virtual directories, you need to set the useFlatBlobListing = true in the ListBlobs() call.

CloudBlobContainer container = GetContainer("mycontainer");
var count = container.ListBlobs(null, true).Count();

Note: the ListBlobs() call with useFlatBlobListing = true is a much more expensive/slow call...



回答6:

With Python API of Azure Storage it is like:

from azure.storage import *
blob_service = BlobService(account_name='myaccount', account_key='mykey')
blobs = blob_service.list_blobs('mycontainer')
len(blobs)  #returns the number of blob in a container