I've written a function which reads csv files and parametrizes them accordingly, therefore i have a function gettypessql which queries sql table at first to get data types and therefore to adjust the columns which are later inserted in sql. So my problem is when I set HDR=Yes in Jet OLE DB I get only column names like F1, F2, F3. To circumvent this issue I've set HDR=No and written some for loops but now I get only empty strings, what is actually the problem? here is my code:
Private Function GetCSVFile(ByVal file As String, ByVal min As Integer, ByVal max As Integer) As DataTable
Dim ConStr As String = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & TextBox1.Text & ";Extended Properties=""TEXT;HDR=NO;IMEX=1;FMT=Delimited;CharacterSet=65001"""
Dim conn As New OleDb.OleDbConnection(ConStr)
Dim dt As New DataTable
Dim da As OleDb.OleDbDataAdapter = Nothing
getData = Nothing
Try
Dim CMD As String = "Select * from " & _table & ".csv"
da = New OleDb.OleDbDataAdapter(CMD, conn)
da.Fill(min, max, dt)
getData = New DataTable(_table)
Dim firstRow As DataRow = dt.Rows(0)
For i As Integer = 0 To dt.Columns.Count - 1
Dim columnName As String = firstRow(i).ToString()
Dim newColumn As New DataColumn(columnName, mListOfTypes(i))
getData.Columns.Add(newColumn)
Next
For i As Integer = 1 To dt.Rows.Count - 1
Dim row As DataRow = dt.Rows(i)
Dim newRow As DataRow = getData.NewRow()
For j As Integer = 0 To getData.Columns.Count - 1
If row(j).GetType Is GetType(String) Then
Dim colValue As String = row(j).ToString()
colValue = ChangeEncoding(colValue)
colValue = ParseString(colValue)
colValue = ReplaceChars(colValue)
newRow(j) = colValue
Else
newRow(j) = row(j)
End If
Next
getData.Rows.Add(newRow)
Application.DoEvents()
Next
Catch ex As OleDbException
MessageBox.Show(ex.Message)
Catch ex As Exception
MessageBox.Show(ex.Message)
Finally
dt.Dispose()
da.Dispose()
End Try
Return getData
End Function
and get types sql, this one doesn't convert properly, especially doubles
Private Sub GetTypesSQL()
If (mListOfTypes Is Nothing) Then
mListOfTypes = New List(Of Type)()
End If
mListOfTypes.Clear()
Dim dtTabelShema As DataTable = db.GetDataTable("SELECT TOP 0 * FROM " & _table)
Using dtTabelShema
For Each col As DataColumn In dtTabelShema.Columns
mListOfTypes.Add(col.DataType)
Next
End Using
End Sub
I think you have made it more complicated than it needs to be. For instance, you get the dbSchema by creating an empty DataTable
and harvesting the Datatypes from it. Why not just use that first table rather than creating a new table from the Types? The table also need not be reconstructed over and over for each batch of rows imported.
Generally since OleDb
will try to infer types from the data, it seems unnecessary and may even get in the way in some cases. Also, you are redoing everything that OleDB does and copying data to a different DT. Given that, I'd skip the overhead OleDB imposes and work with the raw data.
This creates the destination table using the CSV column name and the Type from the Database. If the CSV is not in the same column order as those delivered in a SELECT *
query, it will fail.
The following uses a class to map csv columns to db table columns so the code is not depending on the CSVs being in the same order (since they may be generated externally). My sample data CSV is not in the same order:
Public Class CSVMapItem
Public Property CSVIndex As Int32
Public Property ColName As String = ""
'optional
Public Property DataType As Type
Public Sub New(ndx As Int32, csvName As String,
dtCols As DataColumnCollection)
CSVIndex = ndx
For Each dc As DataColumn In dtCols
If String.Compare(dc.ColumnName, csvName, True) = 0 Then
ColName = dc.ColumnName
DataType = dc.DataType
Exit For
End If
Next
If String.IsNullOrEmpty(ColName) Then
Throw New ArgumentException("Cannot find column: " & csvName)
End If
End Sub
End Class
The code to parse the csv uses CSVHelper
but in this case the TextFieldParser
could be used since the code just reads the CSV rows into a string array.
Dim SQL = String.Format("SELECT * FROM {0} WHERE ID<0", DBTblName)
Dim rowCount As Int32 = 0
Dim totalRows As Int32 = 0
Dim sw As New Stopwatch
sw.Start()
Using dbcon As New MySqlConnection(MySQLConnStr)
Using cmd As New MySqlCommand(SQL, dbcon)
dtSample = New DataTable
dbcon.Open()
' load empty DT, create the insert command
daSample = New MySqlDataAdapter(cmd)
Dim cb = New MySqlCommandBuilder(daSample)
daSample.InsertCommand = cb.GetInsertCommand
dtSample.Load(cmd.ExecuteReader())
' dtSample is not only empty, but has the columns
' we need
Dim csvMap As New List(Of CSVMapItem)
Using sr As New StreamReader(csvfile, False),
parser = New CsvParser(sr)
' col names from CSV
Dim csvNames = parser.Read()
' create a map of CSV index to DT Columnname SEE NOTE
For n As Int32 = 0 To csvNames.Length - 1
csvMap.Add(New CSVMapItem(n, csvNames(n), dtSample.Columns))
Next
' line data read as string
Dim data As String()
data = parser.Read()
Dim dr As DataRow
Do Until data Is Nothing OrElse data.Length = 0
dr = dtSample.NewRow()
For Each item In csvMap
' optional/as needed type conversion
If item.DataType = GetType(Boolean) Then
' "1" wont convert to bool, but (int)1 will
dr(item.ColName) = Convert.ToInt32(data(item.CSVIndex).Trim)
Else
dr(item.ColName) = data(item.CSVIndex).Trim
End If
Next
dtSample.Rows.Add(dr)
rowCount += 1
data = parser.Read()
If rowCount = 50000 OrElse (data Is Nothing OrElse data.Length = 0) Then
totalRows += daSample.Update(dtSample)
' empty the table if there will be more than 100k rows
dtSample.Rows.Clear()
rowCount = 0
End If
Loop
End Using
End Using
End Using
sw.Stop()
Console.WriteLine("Parsed and imported {0} rows in {1}", totalRows,
sw.Elapsed.TotalMinutes)
The processing loop updates the DB every 50K rows in case there are many many rows. It also does it in one pass rather than reading N rows thru OleDB at a time. CsvParser
will read one row at a time, so there should never be more than 50,001 rows worth of data on hand at a time.
There may be special cases to handle for type conversions as shown with If item.DataType = GetType(Boolean) Then
. A Boolean column read in as "1" cant be directly passed to a Boolean column, so it is converted to integer which can. There could be other conversions such as for funky dates.
Time to process 250,001 rows: 3.7 mins. An app which needs to apply those string transforms to every single string column will take much longer. I'm pretty sure that using the CsvReader
in CSVHelper
you could have those applied as part of parsing to a Type.
There is a potential disaster waiting to happen since this is meant to be an all-purpose importer/scrubber.
For i As Integer = 0 To dt.Columns.Count - 1
Dim columnName As String = firstRow(i).ToString()
Dim newColumn As New DataColumn(columnName, mListOfTypes(i))
getData.Columns.Add(newColumn)
Next
Both the question and the self-answer build the new table using the column names from the CSV and the DataTypes from a SELECT *
query on the destination table. So, it assumes the CSV Columns are in the same order that SELECT *
will return them, and that all CSVs will always use the same names as the tables.
The answer above is marginally better in that it finds and matches based on name.
A more robust solution is to write a little utility app where a user maps a DB column name to a CSV index. Save the results to a List(Of CSVMapItem)
and serialize it. There could be a whole collection of these saved to disk. Then, rather than creating a map based on dead reckoning, just deserialize the desired for user as the csvMap
in the above code.