How to calculate sha256 for large files in PHP

2019-09-14 02:42发布

问题:

I would like to ask your assistance on how to calculate sha256 of large files in PHP. Currently, I used Amazon Glacier to store old files and use their API to upload the archive. Initially, I just used small files that cannot reach to MB-sized images. When I tried to upload more than 1MB, the API response said that the checksum I gave to them is different from what they had calculated.

Here is my code to upload the file:

//get the sha256 using the file path
$image = //image path;
$sha256 = hash_file("sha256", $image);

$archive = $glacier->uploadArchive([
            'accountId' => '', 
            'body' => "",
            'checksum' => $sha256,
            'contentSHA256' => $sha256,
            'sourceFile' => $image,
            'vaultName' => 'my-vault'
        ]);

And the error:

AWS HTTP error: Client error: `POST https://glacier.us-west-2.amazonaws.com/vaults/70/archives` resulted in a `400 Bad Request` response:{"code":"InvalidParameterValueException","message":"Checksum mismatch: expected 9f1d4da29b6ec24abde48cb65cc32652ff589467 (truncated...)

I tried the function like below to check for the final hash but it seems it's not the right hash when I print it:

private function getFinalHash($file)
{
    $fp = fopen($file, "r");
    $ctx = hash_init('sha256');
    while (!feof($fp)) {
        $buffer = fgets($fp, 1024);
        hash_update($ctx, $buffer);
    }
    $hash = hash_final($ctx, true); print_r($hash);exit;
    fclose($fp);

}

The resulted hash is like this: ŸM¢›nÂJ½äŒ¶\Ã&RÿX”gíÖ'„IoA\C÷×

The Amazon Glacier API documentation shows how to compute the checksum as stated:

For each 1 MB chunk of payload data, compute the SHA-256 hash. The last chunk of data can be less than 1 MB. For example, if you are uploading a 3.2 MB archive, you compute the SHA-256 hash values for each of the first three 1 MB chunks of data, and then compute the SHA-256 hash of the remaining 0.2 MB data. These hash values form the leaf nodes of the tree.

I think there has something to with the correct way in providing the checksum but I don't know how I should do it with large files using PHP. I really need your help regarding this one.

回答1:

Glacier have theirs own way to count SHA256-TREE-HASH. Here you have working code on PHP. This function returns SHA256 hash created from 1MB parts as they want. It works perfect for me, even for large or small files.

private function getFinalHash($path, $MB = 1048576)
{
    $fp = fopen($path, "rb");
    $hashes = [];
    while (($buffer = fread($fp, $MB))!=="") {
        $hashes[] = hash("sha256", $buffer, true);
    }
    if(count($hashes)==1){
        return bin2hex($hashes[0]);
    }
    while(true){
        $hashes_new = [];
        foreach($hashes as $k => $hash){
            if ($k % 2 == 0) {
                if(isset($hashes[$k+1])){
                    $hashes_new[] = hash("sha256", $hash.$hashes[$k+1], true);
                }
            }
        }
        if(count($hashes)>2 && count($hashes) % 2 != 0){
            $hashes_new[] = $hashes[count($hashes)-1];
        }
        if(count($hashes_new)>1){
            $hashes = $hashes_new;
        }else{
            fclose($fp);
            return bin2hex($hashes_new[0]);
        }
    }
}


回答2:

THe trick is, that the sha256 hash is computed by the AWS SDK for PHP which your are using. So you do not need to calculate the hash by yourself. Here is an example:

$client = new GlacierClient(array(
    'key'    => '[aws access key]',
    'secret' => '[aws secret key]',
    'region' => '[aws region]', // (e.g., us-west-2) )); $result = 
$client->uploadArchive(array(
        'vaultName' => $vaultName,
        'body'      => fopen($filename, 'r'), )); 
$archiveId = $result->get('archiveId');