How to generate md5-hashes for large files with VB

2019-02-24 18:49发布

问题:

I have the following functions to generate md5-hashes for files. The functions work great for small files, but crashes and generate Run-time error 7 - Out of memory when I try to hash files over ~250 MB (I don't actually know at which exact size it breaks, but files below 200 MB work fine).

I don't understand why it breaks at a certain size, so if anyone could shed some light on that I would appreciate it a lot.

Also, is there anything I can do to make the functions handle larger files? I intend to use the functions in a larger tool where I will need to generate hashes for files of unknown sizes. Most will be small enough for the current functions to work, but I will have to be able to handle large files as well.

I got my current functions from the most upvoted answer this post How to get the MD5 hex hash for a file using VBA?

Public Function FileToMD5Hex(ByVal strFileName As String) As String
Dim varEnc           As Variant
Dim varBytes         As Variant
Dim strOut           As String
Dim intPos           As Integer

Set varEnc = CreateObject("System.Security.Cryptography.MD5CryptoServiceProvider")

'Convert the string to a byte array and hash it
varBytes = GetFileBytes(strFileName)
varBytes = varEnc.ComputeHash_2((varBytes))

'Convert the byte array to a hex string
For intPos = 1 To LenB(varBytes)
   strOut = strOut & LCase(Right("0" & Hex(AscB(MidB(varBytes, intPos, 1))), 2))
Next

FileToMD5Hex = strOut

Set varEnc = Nothing

End Function

Private Function GetFileBytes(ByVal strPath As String) As Byte()
Dim lngFileNum          As Long
Dim bytRtnVal()         As Byte

lngFileNum = FreeFile

'If file exists, get number of bytes
If LenB(Dir(strPath)) Then
   Open strPath For Binary Access Read As lngFileNum
   ReDim bytRtnVal(LOF(lngFileNum)) As Byte
   Get lngFileNum, , bytRtnVal
   Close lngFileNum
Else
   MsgBox "Filen finns inte" & vbCrLf & "Avbryter", vbCritical, "Filen hittades inte"
   Exit Function
End If

GetFileBytes = bytRtnVal
Erase bytRtnVal

End Function

Thank you

回答1:

It looks like you reached the memory limit. A better way would be to compute the MD5 of the file by block:

Public Function ComputeMD5(filepath As String) As String
  Dim buffer() As Byte, svc As Object, hFile%, blockSize&, i&
  blockSize = 2 ^ 16

  ' open the file '

  If Len(Dir(filepath)) Then Else Err.Raise 5, , "file not found" & vbCr & filepath

  hFile = FreeFile
  Open filepath For Binary Access Read As hFile

  ' allocate buffer '

  If LOF(hFile) < blockSize Then blockSize = ((LOF(hFile) + 1024) \ 1024) * 1024
  ReDim buffer(0 To blockSize - 1)

  ' compute hash '

  Set svc = CreateObject("System.Security.Cryptography.MD5CryptoServiceProvider")

  For i = 1 To LOF(hFile) \ blockSize
    Get hFile, , buffer
    svc.TransformBlock buffer, 0, blockSize, buffer, 0
  Next

  Get hFile, , buffer
  svc.TransformFinalBlock buffer, 0, LOF(hFile) Mod blockSize
  buffer = svc.Hash

  ' cleanup '

  svc.Clear
  Close hFile

  ' convert to an hexa string '

  ComputeMD5 = String$(32, "0")

  For i = 0 To 15
     Mid$(ComputeMD5, i + i + 2 + (buffer(i) > 15)) = Hex(buffer(i))
  Next

End Function


回答2:

This is an extension to FlorentB's answer, which worked brilliantly for me until my files surpassed the 2GB LOF() size limit.

I tried to adapt for getting file length by alternate means as follows:

Public Function ComputeMD5(filepath As String) As String
    If Len(Dir(filepath)) Then Else Err.Raise 5, , "File not found." & vbCr & filepath

    Dim blockSize As Long: blockSize = 2 ^ 20
    Dim blockSize_f As Double
    Dim buffer() As Byte
    Dim fileLength As Variant
    Dim hFile As Integer
    Dim n_Reads As Long
    Dim i As Long
    Dim svc As Object: Set svc = CreateObject("System.Security.Cryptography.MD5CryptoServiceProvider")

    fileLength = DecGetFileSize(filepath)
    If fileLength < blockSize Then blockSize = ((fileLength + 1024) \ 1024) * 1024
    ReDim buffer(0 To blockSize - 1)
    n_Reads = fileLength / blockSize
    blockSize_f = fileLength - (CDbl(blockSize) * n_Reads)

    hFile = FreeFile
    Open filepath For Binary Access Read As hFile
    For i = 1 To n_Reads
        Get hFile, i, buffer
        svc.TransformBlock buffer, 0, blockSize, buffer, 0
    Next i

    Get hFile, i, buffer
    svc.TransformFinalBlock buffer, 0, blockSize_f
    buffer = svc.Hash
    svc.Clear
    Close hFile

    ComputeMD5 = String$(32, "0")
    For i = 0 To 15
        Mid$(ComputeMD5, i + i + 2 + (buffer(i) > 15)) = Hex(buffer(i))
    Next

End Function

Public Function DecGetFileSize(fname As String) As Variant
    Dim fso As New FileSystemObject
    Dim f: Set f = fso.GetFile(fname)
    DecGetFileSize = CDec(f.Size)
    Set f = Nothing
    Set fso = Nothing
End Function

This all runs fine, returning a string, however that string does not equal the MD5 calculated using other tools on the same file.

I can't work out where the discrepancy is originating.

I've checked and double checked filelength, n_reads, blockSize and blockSize_f and I'm sure those values are all correct.

I had some trouble with the Get function, where if I didn't explicitly tell it the block number, it dies at block 2048.

Any ideas / pointers would be much appreciated.