How to query UTF-8 encoded CSV-files with VBA in E

2020-04-11 07:31发布

问题:

I would like to query an UTF-8 encoded CSV file using VBA in Excel 2010 with the following database connection:

provider=Microsoft.Jet.OLEDB.4.0;;data source='xyz';Extended Properties="text;HDR=Yes;FMT=Delimited(,);CharacterSet=65001"

All CSV files start with the BOM \xEF\xBB\xBF and the header line. Somehow the BOM isn't recognized correctly and the first column header gets read as "?header_name", i.e. a question mark gets prepended. I have tried different CharacterSets and I have also tried to use Microsoft.ACE.OLEDB.12.0, but everything was without success so far.

Is this a known bug or is there any way to get the right first column header name without changing the encoding of the source files?

回答1:

The following procedure extracts the entire CSVfile into a new Sheet, clearing the BOM from the Header. It has the Path, Filename and BOM string as variables to provide flexibility.

Use this procedure to call the Query procedure

Sub Qry_Csv_Utf8()
Const kFile As String = "UTF8 .csv"
Const kPath As String = "D:\StackOverFlow\Temp\"
Const kBOM As String = "\xEF\xBB\xBF"
    Call Ado_Qry_Csv(kPath, kFile, kBOM)
End Sub

This is the Query procedure

Sub Ado_Qry_Csv(sPath As String, sFile As String, sBOM As String)
Dim Wsh As Worksheet
Dim AdoConnect As ADODB.Connection
Dim AdoRcrdSet As ADODB.Recordset
Dim i As Integer

    Rem Add New Sheet - Select option required
    'With ThisWorkbook           'Use this if procedure is resident in workbook receiving csv data
    'With Workbooks(WbkName)     'Use this if procedure is not in workbook receiving csv data
    With ActiveWorkbook         'I used this for testing purposes
        Set Wsh = .Sheets.Add(After:=.Sheets(.Sheets.Count))
        'Wsh.Name = NewSheetName        'rename new Sheet
    End With

    Set AdoConnect = New ADODB.Connection
    AdoConnect.Open "Provider=Microsoft.Jet.OLEDB.4.0;" & _
        "Data Source=" & sPath & ";" & _
        "Extended Properties='text;HDR=Yes;FMT=Delimited(,);CharacterSet=65001'"

    Set AdoRcrdSet = New ADODB.Recordset
    AdoRcrdSet.Open Source:="SELECT * FROM [" & sFile & "]", _
        ActiveConnection:=AdoConnect, _
        CursorType:=adOpenDynamic, _
        LockType:=adLockReadOnly, _
        Options:=adCmdText

    Rem Enter Csv Records in Worksheet
    For i = 0 To -1 + AdoRcrdSet.Fields.Count
        Wsh.Cells(1, 1 + i).Value = _
            WorksheetFunction.Substitute(AdoRcrdSet.Fields(i).Name, sBOM, "")
    Next
    Wsh.Cells(2, 1).CopyFromRecordset AdoRcrdSet

End Sub


回答2:

The only solution for this problem I found is to use Schema.ini file.

my test csv file

Col_A;Col_B;Col_C
Some text example;123456789;3,14

Schema.ini for my test csv file

[UTF-8_Csv_With_BOM.csv] 
Format=Delimited(;)
Col1=Col_A Text
Col2=Col_B Long
Col3=Col_C Double

This Schema.ini file contains the name of the source csv file and describes my columns. Each column is specified by its name and type but you can specify more informations. This file must be located in the same folder as your csv file. More info here.

Finally the VBA code which reads the csv file. Note that HDR=No. This is because the columns headers are defined in the Schema.ini.

' Add reference to Microsoft ActiveX Data Objects 6.1 Library
Sub ReadCsv()

    Const filePath As String = "c:\Temp\StackOverflow\"
    Const fileName As String = "UTF-8_Csv_With_BOM.csv"
    Dim conn As ADODB.Connection
    Dim rs As New ADODB.Recordset

    Set conn = New ADODB.Connection
    conn.Open "Provider=Microsoft.Jet.OLEDB.4.0;Data Source='" & filePath & _
        "';Extended Properties='text;HDR=No;FMT=Delimited()';"

    With rs
        .ActiveConnection = conn
        .Open "SELECT * FROM [" & fileName & "]"
        If Not .BOF And Not .EOF Then
            While (Not .EOF)
                Debug.Print rs.Fields("Col_A") & " " & _
                            rs.Fields("Col_B") & " " & _
                            rs.Fields("Col_C")
                .MoveNext
            Wend
        End If
        .Close
    End With

    conn.Close
    Set conn = Nothing

End Sub

Output

Some text example 123456789 3,14