Creating hash for folder

2019-01-21 23:41发布

问题:

i need to create hash for folder, that contains some files. I already done this task for each of files, but i searching the way to create one hash for all files in folder. Any ideas how to do that?

(of course i can create hash for each file and concatenate it to some big hash but it's not a way i like)

Thanks in advance.

回答1:

This hashes all file (relative) paths and contents, and correctly handles file ordering.

And it's quick - like 30ms for a 4MB directory.

using System;
using System.Text;
using System.Security.Cryptography;
using System.IO;
using System.Linq;

...

public static string CreateMd5ForFolder(string path)
{
    // assuming you want to include nested folders
    var files = Directory.GetFiles(path, "*.*", SearchOption.AllDirectories)
                         .OrderBy(p => p).ToList();

    MD5 md5 = MD5.Create();

    for(int i = 0; i < files.Count; i++)
    {
        string file = files[i];

        // hash path
        string relativePath = file.Substring(path.Length + 1);
        byte[] pathBytes = Encoding.UTF8.GetBytes(relativePath.ToLower());
        md5.TransformBlock(pathBytes, 0, pathBytes.Length, pathBytes, 0);

        // hash contents
        byte[] contentBytes = File.ReadAllBytes(file);
        if (i == files.Count - 1)
            md5.TransformFinalBlock(contentBytes, 0, contentBytes.Length);
        else
            md5.TransformBlock(contentBytes, 0, contentBytes.Length, contentBytes, 0);
    }

    return BitConverter.ToString(md5.Hash).Replace("-", "").ToLower();
}


回答2:

Dunc's answer works well; however, it does not handle an empty directory. The code below returns the MD5 'd41d8cd98f00b204e9800998ecf8427e' (the MD5 for a 0 length character stream) for an empty directory.

public static string CreateDirectoryMd5(string srcPath)
{
    var filePaths = Directory.GetFiles(srcPath, "*", SearchOption.AllDirectories).OrderBy(p => p).ToArray();

    using (var md5 = MD5.Create())
    {
        foreach (var filePath in filePaths)
        {
            // hash path
            byte[] pathBytes = Encoding.UTF8.GetBytes(filePath);
            md5.TransformBlock(pathBytes, 0, pathBytes.Length, pathBytes, 0);

            // hash contents
            byte[] contentBytes = File.ReadAllBytes(filePath);

            md5.TransformBlock(contentBytes, 0, contentBytes.Length, contentBytes, 0);
        }

        //Handles empty filePaths case
        md5.TransformFinalBlock(new byte[0], 0, 0);

        return BitConverter.ToString(md5.Hash).Replace("-", "").ToLower();
    }
}


回答3:

Create tarball of files, hash the tarball.

> tar cf hashes *.abc
> md5sum hashes

Or hash the individual files and pipe output into hash command.

> md5sum *.abc | md5sum

Edit: both approaches above do not sort the files so may return different hash for each invocation, depending upon how the shell expands asterisks.



回答4:

Concatenate filenames and files content in one big string and hash that, or do the hashing in chunks for performance.

Sure you need to take few things into account:

  • You need to sort files by name, so you don't get two different hashes in case files order changes.
  • Using this method you only take the filenames and content into account. if the filename doesn't count you may sort by content first then hash, if more attributes (ctime/mtime/hidden/archived..) matters, include them in the to-be-hashed string.


回答5:

If you already have hashes for all the files, just sort the hashes alphabetically, concatenate them and hash them again to create an uber hash.



回答6:

Here's a solution that uses streaming to avoid memory and latency issues.

By default the file paths are included in the hashing, which will factor not only the data in the files, but the file system entries themselves, which avoids hash collisions. This post is tagged security, so this ought to be important.

Finally, this solution puts you in control the hashing algorithm and which files get hashed and in what order.

public static class HashAlgorithmExtensions
{
    public static async Task<byte[]> ComputeHashAsync(this HashAlgorithm alg, IEnumerable<FileInfo> files, bool includePaths = true)
    {
        using (var cs = new CryptoStream(Stream.Null, alg, CryptoStreamMode.Write))
        {
            foreach (var file in files)
            {
                if (includePaths)
                {
                    var pathBytes = Encoding.UTF8.GetBytes(file.FullName);
                    cs.Write(pathBytes, 0, pathBytes.Length);
                }

                using (var fs = file.OpenRead())
                    await fs.CopyToAsync(cs);
            }

            cs.FlushFinalBlock();
        }

        return alg.Hash;
    }
}

An example that hashes all the files in a folder:

async Task<byte[]> HashFolder(DirectoryInfo folder, string searchPattern = "*", SearchOption searchOption = SearchOption.TopDirectoryOnly)
{
    using(var alg = MD5.Create())
        return await alg.ComputeHashAsync(folder.EnumerateFiles(searchPattern, searchOption));
}


标签: c# security