Azure Put Blob API returns with a non-matching siz

2019-08-02 09:20发布

I am trying to upload a blob (pdf) file from my laptop to a container in Azure storage account. I found it to be working but with one glitch.

I am calculating the file size using:

f_info = os.stat(file_path)          
file_size = (f_info.st_size)          # returns - 19337

Then I insert this value in below canonicalized header:

ch = "PUT\n\n\n"+str(file_size)+"\n\napplication/pdf\n\n\n\n\n\n\nx-ms-blob-type:BlockBlob" + "\nx-ms-date:" + date + "\nx-ms-version:" + version + "\n"

and send the PUT request to PUT Blob API, however, it returns an error saying, "Authentication failed because the server used below below string to calculate the signature"

\'PUT\n\n\n19497\n\napplication/pdf\n\n\n\n\n\n\nx-ms-blob-type:BlockBlob\nx-ms-date:[date]\nx-ms-version:[API version]

Looking at this string it obvious that authentication failed because file size which azure calculated returns a different value! I don't understand how its calculating this value of file size?!?!

FYI: If I replace 19337 with 19497 in canonicalized string and re run. It works! Any suggestion on where I am making mistakes?

Below is the code:

storage_AccountName = '<storage account name>'  
storage_ContainerName = "<container_name>"
storageKey='<key>'  

fd = "C:\\<path>\\<to>\\<file_to_upload>.pdf"

URI = 'https://' + storageAccountName + '.blob.core.windows.net/<storage_ContainerName >/<blob_file_name.pdf>
version = '2017-07-29'                                                                                         
date = datetime.datetime.utcnow().strftime("%a, %d %b %Y %H:%M:%S GMT") 

if os.path.isfile(fd):
    file_info = os.stat(fd)
    file_size = (file_info.st_size)

ch = "PUT\n\n\n"+str(file_size)+"\n\napplication/pdf\n\n\n\n\n\n\nx-ms-blob-type:BlockBlob" + "\nx-ms-date:" + date + "\nx-ms-version:" + version + "\n"
cr = "/<storage_AccountName>/<storage_Containername>/<blob_file_name.pdf>"
canonicalizedString = ch + cr

storage_account_key = base64.b64decode(storageKey)
byte_canonicalizedString=canonicalizedString.encode('utf-8')
signature = base64.b64encode(hmac.new(key=storage_account_key, msg=byte_canonicalizedString,  digestmod=hashlib.sha256).digest())

header = {
          'x-ms-blob-type': "BlockBlob",   
          'x-ms-date': date,
          'x-ms-version': version,
          'Authorization': 'SharedKey ' + storageAccountName + ':' + signature.decode('utf-8'),
          #'Content-Length': str(19497),          # works
          'Content-Length': str(file_size),       # doesn't work
          'Content-Type': "application/pdf"} 


files = {'file': open(fd, 'rb')}
result = requests.put(url = URI, headers = header, files = files) 
print (result.content)

1条回答
Summer. ? 凉城
2楼-- · 2019-08-02 10:17

As mentioned in the comments, the reason you're getting the content length mismatched header is because instead of uploading the file, you're uploading an object which contains file contents and that is causing the content length to increase.

Please change the following line of codes:

files = {'file': open(fd, 'rb')}
result = requests.put(url = URI, headers = header, files = files)

to something like:

data = open(fd, 'rb') as stream 
result = requests.put(url = URI, headers = header, data = data)

And now you're only uploading the file contents.

查看更多
登录 后发表回答