Extract text from a text file with Chinese charact

2019-07-11 18:17发布

I have a batch of like 100,000 text files which I would like to extract as strings using vba. In the past I have been doing so this way without problem:

Sub Main()
Dim PathAndName As String
Dim TextFile As Integer
Dim TextString() As String
Redim TextString(100000)
For i = 1 To 100,000
    PathAndName = "C:\File_" & i & ".ext"
    TextFile = 1
    Open PathAndName For Input As TextFile
    TextString(i) = Input(LOF(TextFile), TextFile)
Next i
End Sub

This time, the script returns the error "Input Past End of File" Error 62. The only different I can spot is that this time the text files contain a few Chinese Characters, which are not of my interest actually. That's why I believe this is the source of the problem. The Chinese Characters appear at the first line of the files.

Any help is appreciated. Thanks!

1条回答
该账号已被封号
2楼-- · 2019-07-11 18:57

I suspect your text file is in a multibyte encoding now. There one character is encoded in two or three bytes. So LOF(TextFile) will not return the correct character count but the byte count. But Input(LOF(TextFile), TextFile) needs the character count since it must create a String.

You could use:

Sub Main()
Dim PathAndName As String
Dim TextFile As Integer
Dim TextString() As String
Redim TextString(100000)
For i = 1 To 100000
    PathAndName = "C:\File_" & i & ".ext"
    TextFile = 1
    Open PathAndName For Input As TextFile

    Dim sLine As String
    Dim sTextString As String
    sLine = ""
    sTextString = ""

    Do While Not EOF(TextFile)
     Input #TextFile, sLine
     sTextString = sTextString & sLine
    Loop

    TextString(i) = sTextString

    Close #TextFile

Next i
End Sub

But the better approach would be using ADODB.Stream instead of the dinosaur VB file access methods. But this is a totally different approach. So you should read about ADODB.Stream yourself first.

查看更多
登录 后发表回答