I've written a function which reads csv files and parametrizes them accordingly, therefore i have a function gettypessql which queries sql table at first to get data types and therefore to adjust the columns which are later inserted in sql. So my problem is when I set HDR=Yes in Jet OLE DB I get only column names like F1, F2, F3. To circumvent this issue I've set HDR=No and written some for loops but now I get only empty strings, what is actually the problem? here is my code:
Private Function GetCSVFile(ByVal file As String, ByVal min As Integer, ByVal max As Integer) As DataTable
Dim ConStr As String = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & TextBox1.Text & ";Extended Properties=""TEXT;HDR=NO;IMEX=1;FMT=Delimited;CharacterSet=65001"""
Dim conn As New OleDb.OleDbConnection(ConStr)
Dim dt As New DataTable
Dim da As OleDb.OleDbDataAdapter = Nothing
getData = Nothing
Try
Dim CMD As String = "Select * from " & _table & ".csv"
da = New OleDb.OleDbDataAdapter(CMD, conn)
da.Fill(min, max, dt)
getData = New DataTable(_table)
Dim firstRow As DataRow = dt.Rows(0)
For i As Integer = 0 To dt.Columns.Count - 1
Dim columnName As String = firstRow(i).ToString()
Dim newColumn As New DataColumn(columnName, mListOfTypes(i))
getData.Columns.Add(newColumn)
Next
For i As Integer = 1 To dt.Rows.Count - 1
Dim row As DataRow = dt.Rows(i)
Dim newRow As DataRow = getData.NewRow()
For j As Integer = 0 To getData.Columns.Count - 1
If row(j).GetType Is GetType(String) Then
Dim colValue As String = row(j).ToString()
colValue = ChangeEncoding(colValue)
colValue = ParseString(colValue)
colValue = ReplaceChars(colValue)
newRow(j) = colValue
Else
newRow(j) = row(j)
End If
Next
getData.Rows.Add(newRow)
Application.DoEvents()
Next
Catch ex As OleDbException
MessageBox.Show(ex.Message)
Catch ex As Exception
MessageBox.Show(ex.Message)
Finally
dt.Dispose()
da.Dispose()
End Try
Return getData
End Function
and get types sql, this one doesn't convert properly, especially doubles
Private Sub GetTypesSQL()
If (mListOfTypes Is Nothing) Then
mListOfTypes = New List(Of Type)()
End If
mListOfTypes.Clear()
Dim dtTabelShema As DataTable = db.GetDataTable("SELECT TOP 0 * FROM " & _table)
Using dtTabelShema
For Each col As DataColumn In dtTabelShema.Columns
mListOfTypes.Add(col.DataType)
Next
End Using
End Sub
I think you have made it more complicated than it needs to be. For instance, you get the dbSchema by creating an empty
DataTable
and harvesting the Datatypes from it. Why not just use that first table rather than creating a new table from the Types? The table also need not be reconstructed over and over for each batch of rows imported.Generally since
OleDb
will try to infer types from the data, it seems unnecessary and may even get in the way in some cases. Also, you are redoing everything that OleDB does and copying data to a different DT. Given that, I'd skip the overhead OleDB imposes and work with the raw data.This creates the destination table using the CSV column name and the Type from the Database. If the CSV is not in the same column order as those delivered in a
SELECT *
query, it will fail.The following uses a class to map csv columns to db table columns so the code is not depending on the CSVs being in the same order (since they may be generated externally). My sample data CSV is not in the same order:
The code to parse the csv uses
CSVHelper
but in this case theTextFieldParser
could be used since the code just reads the CSV rows into a string array.The processing loop updates the DB every 50K rows in case there are many many rows. It also does it in one pass rather than reading N rows thru OleDB at a time.
CsvParser
will read one row at a time, so there should never be more than 50,001 rows worth of data on hand at a time.There may be special cases to handle for type conversions as shown with
If item.DataType = GetType(Boolean) Then
. A Boolean column read in as "1" cant be directly passed to a Boolean column, so it is converted to integer which can. There could be other conversions such as for funky dates.Time to process 250,001 rows: 3.7 mins. An app which needs to apply those string transforms to every single string column will take much longer. I'm pretty sure that using the
CsvReader
inCSVHelper
you could have those applied as part of parsing to a Type.There is a potential disaster waiting to happen since this is meant to be an all-purpose importer/scrubber.
Both the question and the self-answer build the new table using the column names from the CSV and the DataTypes from a
SELECT *
query on the destination table. So, it assumes the CSV Columns are in the same order thatSELECT *
will return them, and that all CSVs will always use the same names as the tables.The answer above is marginally better in that it finds and matches based on name.
A more robust solution is to write a little utility app where a user maps a DB column name to a CSV index. Save the results to a
List(Of CSVMapItem)
and serialize it. There could be a whole collection of these saved to disk. Then, rather than creating a map based on dead reckoning, just deserialize the desired for user as thecsvMap
in the above code.