I would like to query an UTF-8 encoded CSV file using VBA in Excel 2010 with the following database connection:
provider=Microsoft.Jet.OLEDB.4.0;;data source='xyz';Extended Properties="text;HDR=Yes;FMT=Delimited(,);CharacterSet=65001"
All CSV files start with the BOM \xEF\xBB\xBF and the header line. Somehow the BOM isn't recognized correctly and the first column header gets read as "?header_name", i.e. a question mark gets prepended. I have tried different CharacterSets and I have also tried to use Microsoft.ACE.OLEDB.12.0, but everything was without success so far.
Is this a known bug or is there any way to get the right first column header name without changing the encoding of the source files?
The following procedure extracts the entire CSV
file into a new Sheet
, clearing the BOM
from the Header. It has the Path, Filename and BOM string as variables to provide flexibility.
Use this procedure to call the Query procedure
Sub Qry_Csv_Utf8()
Const kFile As String = "UTF8 .csv"
Const kPath As String = "D:\StackOverFlow\Temp\"
Const kBOM As String = "\xEF\xBB\xBF"
Call Ado_Qry_Csv(kPath, kFile, kBOM)
End Sub
This is the Query procedure
Sub Ado_Qry_Csv(sPath As String, sFile As String, sBOM As String)
Dim Wsh As Worksheet
Dim AdoConnect As ADODB.Connection
Dim AdoRcrdSet As ADODB.Recordset
Dim i As Integer
Rem Add New Sheet - Select option required
'With ThisWorkbook 'Use this if procedure is resident in workbook receiving csv data
'With Workbooks(WbkName) 'Use this if procedure is not in workbook receiving csv data
With ActiveWorkbook 'I used this for testing purposes
Set Wsh = .Sheets.Add(After:=.Sheets(.Sheets.Count))
'Wsh.Name = NewSheetName 'rename new Sheet
End With
Set AdoConnect = New ADODB.Connection
AdoConnect.Open "Provider=Microsoft.Jet.OLEDB.4.0;" & _
"Data Source=" & sPath & ";" & _
"Extended Properties='text;HDR=Yes;FMT=Delimited(,);CharacterSet=65001'"
Set AdoRcrdSet = New ADODB.Recordset
AdoRcrdSet.Open Source:="SELECT * FROM [" & sFile & "]", _
ActiveConnection:=AdoConnect, _
CursorType:=adOpenDynamic, _
LockType:=adLockReadOnly, _
Options:=adCmdText
Rem Enter Csv Records in Worksheet
For i = 0 To -1 + AdoRcrdSet.Fields.Count
Wsh.Cells(1, 1 + i).Value = _
WorksheetFunction.Substitute(AdoRcrdSet.Fields(i).Name, sBOM, "")
Next
Wsh.Cells(2, 1).CopyFromRecordset AdoRcrdSet
End Sub
The only solution for this problem I found is to use Schema.ini
file.
my test csv file
Col_A;Col_B;Col_C
Some text example;123456789;3,14
Schema.ini for my test csv file
[UTF-8_Csv_With_BOM.csv]
Format=Delimited(;)
Col1=Col_A Text
Col2=Col_B Long
Col3=Col_C Double
This Schema.ini
file contains the name of the source csv file and describes my columns. Each column is specified by its name and type but you can specify more informations. This file must be located in the same folder as your csv file. More info here.
Finally the VBA code which reads the csv file. Note that HDR=No
. This is because the columns headers are defined in the Schema.ini
.
' Add reference to Microsoft ActiveX Data Objects 6.1 Library
Sub ReadCsv()
Const filePath As String = "c:\Temp\StackOverflow\"
Const fileName As String = "UTF-8_Csv_With_BOM.csv"
Dim conn As ADODB.Connection
Dim rs As New ADODB.Recordset
Set conn = New ADODB.Connection
conn.Open "Provider=Microsoft.Jet.OLEDB.4.0;Data Source='" & filePath & _
"';Extended Properties='text;HDR=No;FMT=Delimited()';"
With rs
.ActiveConnection = conn
.Open "SELECT * FROM [" & fileName & "]"
If Not .BOF And Not .EOF Then
While (Not .EOF)
Debug.Print rs.Fields("Col_A") & " " & _
rs.Fields("Col_B") & " " & _
rs.Fields("Col_C")
.MoveNext
Wend
End If
.Close
End With
conn.Close
Set conn = Nothing
End Sub
Output
Some text example 123456789 3,14