Greetings helpful delightful people,
I have a problem with reading CSV files and converting to datatables in VB.Net.
If the CSV file contains a column full of '-' then on import into the datatable they appear as '0' and the entire column is formatted in a numeric format.
The code I have written is:
Public Function LoadCsvFile(filePath As String) As DataTable
Try
DTReadCSV = New DataTable
Dim connection As System.Data.OleDb.OleDbConnection = New System.Data.OleDb.OleDbConnection("Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & Microsoft.VisualBasic.Left(filePath, InStrRev(filePath, "\")) & ";Extended Properties=""text;HDR=Yes;FMT=Delimited""")
Dim adapter As New OleDb.OleDbDataAdapter("SELECT * FROM [" + Microsoft.VisualBasic.Mid(filePath, InStrRev(filePath, "\") + 1) + "]", connection)
'Dim table As New DataTable()
adapter.Fill(DTReadCSV)
'now thats its nicely in a datatable
IntAmountRows = DTReadCSV.Rows.Count
IntAmountColumns = DTReadCSV.Columns.Count
'System.Diagnostics.Debug.Print(DTReadCSV.Rows.Item(1)(1).ToString)
Return DTReadCSV
Exit Function
Catch ex As Exception
MsgBox(ex.ToString())
MsgBox(Err.Number & " " & Chr(13) & Err.Description)
End Try
End Function
Please can someone smarter figure out how to combat this issue besides modifying the CSV file by taking out the '-' as blanks, which at the moment seems the only long winded way of importing these CSV files.
Many thanks
Using a Schema.INI you can describe to OleDB what the CSV looks like in detail including resulting column names and data types. For instance, given this data:
"Country","Capital City","Population", "Fake"
"France","Paris","2.25","-----"
"Canada","Toronto","2.5","-----"
"Italy","Rome","2.8","-----"
Create a Schema.ini file in the same folder; it can have multiple sections to define various CSVs in that folder. If there is a Schema.INI in the same folder as the CSV and it has an entry for your CSV, OleDB will automatically use it (nothing special to do).
The Schema.INI entry for the above data may look like this:
[Capitals.CSV]
ColNameHeader=True
CharacterSet=1252
Format=CSVDelimited
TextDelimiter="
Col1="Country" Text Width 254
Col2="Capital City" Text Width 254
Col3="Population" Single
Col4="Fake" Text Width 254
OleDb will use these definitions when reading that file, resulting in a string of dashes for the "Fake" column in the output:
One additional benefit (among many) of using a Schema.INI is that you can name the columns there rather than using or aliasing F1, F2, F3 etc. The SQL for that output was just "SELECT * FROM Capitals.CSV"
Tip (for others to come): to specify UTF8 as the character set, use CharacterSet=65001
in the schema.
For more information see
- Schema.ini Text File Driver
- Code Page Identifiers
You could use Sebastien Lorion's fast CSV reader. I've been using it for almost 10 years (the source code is available and there is a Nuget package available for it if you just want to load that).
I'll include the links here to it and a code snippit that should get you what you want:
- Nuget - https://www.nuget.org/packages/LumenWorksCsvReader/
- Code Project Article on it: http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader
In this example I have the filename as an input like you do, you could also have a string, the stream, whatever you need:
''' <summary>
''' Uses Lumenworks Fast CSV reader to load a DataTable
''' </summary>
''' <param name="fileName"></param>
''' <returns></returns>
''' <remarks></remarks>
Public Function CsvToDataTable(fileName As String) As DataTable
Dim data As String = My.Computer.FileSystem.ReadAllText(fileName)
Dim dt As New DataTable
Using sr As New StringReader(data)
' The true indicates it has header values which can be used to access fields by their name, switch to
' false if the CSV doesn't have them
Using csv As New LumenWorks.Framework.IO.Csv.CsvReader(sr, True)
dt.Load(csv)
End Using
sr.Close()
End Using
Return dt
End Function