What is a superfast way to read large files line-b

2019-01-13 22:11发布

I believe I have come up with a very efficient way to read very, very large files line-by-line. Please tell me if you know of a better/faster way or see room for improvement. I am trying to get better at coding, so any sort of advice you have would be nice. Hopefully this is something that other people might find useful, too.

It appears to be something like 8 times faster than using Line Input from my tests.

'This function reads a file into a string.                        '
'I found this in the book Programming Excel with VBA and .NET.    '
Public Function QuickRead(FName As String) As String
    Dim I As Integer
    Dim res As String
    Dim l As Long

    I = FreeFile
    l = FileLen(FName)
    res = Space(l)
    Open FName For Binary Access Read As #I
    Get #I, , res
    Close I
    QuickRead = res
End Function

'This function works like the Line Input statement'
Public Sub QRLineInput( _
    ByRef strFileData As String, _
    ByRef lngFilePosition As Long, _
    ByRef strOutputString, _
    ByRef blnEOF As Boolean _
    )
    On Error GoTo LastLine
    strOutputString = Mid$(strFileData, lngFilePosition, _
        InStr(lngFilePosition, strFileData, vbNewLine) - lngFilePosition)
    lngFilePosition = InStr(lngFilePosition, strFileData, vbNewLine) + 2
    Exit Sub
LastLine:
    blnEOF = True
End Sub

Sub Test()
    Dim strFilePathName As String: strFilePathName = "C:\Fld\File.txt"
    Dim strFile As String
    Dim lngPos As Long
    Dim blnEOF As Boolean
    Dim strFileLine As String

    strFile = QuickRead(strFilePathName) & vbNewLine
    lngPos = 1

    Do Until blnEOF
        Call QRLineInput(strFile, lngPos, strFileLine, blnEOF)
    Loop
End Sub

Thanks for the advice!

标签: vba file-io
9条回答
劫难
2楼-- · 2019-01-13 23:04

Line Input works fine for small files. However, when file sizes reach around 90k, Line Input jumps all over the place and reads data in the wrong order from the source file. I tested it with different filesizes:

49k = ok
60k = ok
78k = ok
85k = ok
93k = error
101k = error
127k = error
156k = error

Lesson learned - use Scripting.FileSystemObject

查看更多
太酷不给撩
3楼-- · 2019-01-13 23:08

'you can modify above and read full file in one go and then display each line as shown below

Option Explicit

Public Function QuickRead(FName As String) As Variant
    Dim i As Integer
    Dim res As String
    Dim l As Long
    Dim v As Variant

    i = FreeFile
    l = FileLen(FName)
    res = Space(l)
    Open FName For Binary Access Read As #i
    Get #i, , res
    Close i
    'split the file with vbcrlf
    QuickRead = Split(res, vbCrLf)
End Function

Sub Test()
    ' you can replace file for "c:\writename.txt to any file name you desire
    Dim strFilePathName As String: strFilePathName = "C:\writename.txt"
    Dim strFileLine As String
    Dim v As Variant
    Dim i As Long
    v = QuickRead(strFilePathName)
    For i = 0 To UBound(v)
        MsgBox v(i)
    Next
End Sub
查看更多
欢心
4楼-- · 2019-01-13 23:10

I just wanted to share some of my results...

I have text files, which apparently came from a Linux system, so I only have a vbLF/Chr(10) at the end of each line and not vbCR/Chr(13).

Note 1:

  • This meant that the Line Input method would read in the entire file, instead of just one line at a time.

From my research testing small (152KB) & large (2778LB) files, both on and off the network I found the following:

Open FileName For Input: Line Input was the slowest (See Note 1 above)

Open FileName For Binary Access Read: Input was the fastest for reading the whole file

FSO.OpenTextFile: ReadLine was fast, but a bit slower then Binary Input

Note 2:

  • If I just needed to check the file header (first 1-2 lines) to check if I had the proper file/format, then FSO.OpenTextFile was the fastest, followed very closely by Binary Input.

  • The drawback with the Binary Input is that you have to know how many characters you want to read.

  • On normal files, Line Input would also be a good option as well, but I couldn't test due to Note 1.

 

Note 3:

  • Obviously, the files on the network showed the largest difference in read speed. They also showed the greatest benefit from reading the file a second time (although there are certainly memory buffers that come into play here).
查看更多
登录 后发表回答