I'm providing an import facility in my application to allow users to populate it with basic starting data. I have no control over how they create the data they import or whether they actually follow the import spec I provide (as amply demonstrated by the experience in beta testing to date). I've also learned that they're importing datasets way larger that I expected (largest so far is 750,000 records), so the import needs to perform well.
There'll be five or six different sets of data to import, ranging from 4 columns to 24 columns, and I'm trying to come up with a method I can use for all of them, starting with the simplest dataset.
This simplest import file will be a csv file with 4 columns without headers, and should end up in a temporary Access table with fields as follows.
GenID: text (up to 255 chars)
Surname: text (up to 255 chars)
GivenNames: text (up to 255 chars)
OtherInfo: multi-line memo (as long as they want)
The data should be imported into a temporary table, where validation will be carried out before being appended to existing data in its final home.
I first tried using docmd.transfertext with a saved import spec -- which works, but is painfully slow for anything other than the smallest import files. (I can't swear that it doesn't exhibit some of the same flaws that I discuss with a second method below, as I haven't tested it exhaustively).
The next method I've tried is based on the approach at https://stackoverflow.com/a/11147920/1943174, although I've struggled to find any documentation on the exact behaviour I should expect when performing an import using an Xml Import Spec, or on the XML syntax I should use.
I create and execute an import spec in VBA. strFilePath identifies the csv file to be imported and strTableName identified the temporary table to be used for the import.
Dim strXML As String
strXML = ""
strXML = strXML & "<?xml version=""1.0"" encoding=""utf-8"" ?>" & vbCrLf
strXML = strXML & "<ImportExportSpecification Path=" & Chr(34) & strFilePath & Chr(34) & " xmlns=""urn:www.microsoft.com/office/access/imexspec"">" & vbCrLf
strXML = strXML & " <ImportText TextFormat=""Delimited"" FirstRowHasNames=""false"" FieldDelimiter="","" CodePage=""437"" Destination=" & Chr(34) & strTableName & Chr(34) & " >" & vbCrLf
strXML = strXML & " <DateFormat DateOrder=""DMY"" DateDelimiter=""/"" TimeDelimiter="":"" FourYearDates=""true"" DatesLeadingZeros=""false"" />" & vbCrLf
strXML = strXML & " <NumberFormat DecimalSymbol=""."" />" & vbCrLf
strXML = strXML & " <Columns PrimaryKey=""{none}"">" & vbCrLf
strXML = strXML & " <Column Name=""Col1"" FieldName=""GenID"" Indexed=""NO"" SkipColumn=""false"" DataType=""Text"" />" & vbCrLf
strXML = strXML & " <Column Name=""Col2"" FieldName=""Surname"" Indexed=""NO"" SkipColumn=""false"" DataType=""Text"" />" & vbCrLf
strXML = strXML & " <Column Name=""Col3"" FieldName=""GivenNames"" Indexed=""NO"" SkipColumn=""false"" DataType=""Text"" />" & vbCrLf
strXML = strXML & " <Column Name=""Col4"" FieldName=""OtherInfo"" Indexed=""NO"" SkipColumn=""false"" DataType=""Memo"" />" & vbCrLf
strXML = strXML & " </Columns>" & vbCrLf
strXML = strXML & " </ImportText>" & vbCrLf
strXML = strXML & "</ImportExportSpecification>"
CurrentProject.ImportExportSpecifications.Add "MyImportSpec", strXML 'Add it to the project
DoCmd.RunSavedImportExport "MyImportSpec"'Run it
The import is very quick, and the test input data I'm using is correctly formed, but the result is an Access table with the right structure, but the data imported into the wrong fields:
GenID -- in the right place
GivenNames -- in the Other Info (memo) field
OtherInfo -- in the Surname (text) field
Surname -- in the Given Names (field).
If I change the order of the columns in the import file so that they're in alphabetical order (GenID, GivenNames, OtherInfo, Surname) then the import works correctly, but that order won't make sense to the users or be simple to produce, and so they're likely to end up populating their import files incorrectly. (Plus it will also fail because of the flaw I'm about to describe).
If I change the names of the fields created in the temporary table to XA,XB,XC,XD (planning to change field names after the import, once the data is validated and is being appended the table which will be its final home) the right data goes into the 'right' field, but only as long as there are exactly 4 columns in the import file. If a fifth is present, I get the following fields:
Field5 -- contains GenID data
XA -- contains Surname
XB -- contains GivenNames
XC -- contains OtherInfo
XD -- contains the data that was in the 5th column in the import file
The same problems occur if I append the imported data to an existing (empty) table with the correct structure rather than allow the import to create a new table.
Any suggestions on how I can make this work, or for other approaches which will handle rapid import of text and multi-line memo fields in large import csv datasets where the users can't be guaranteed not to have included extraneous columns at the end? I don't have any ability to edit the import csv -- all data changes need to be within Access and the csv file must come through the process unchanged.
This update brought to you by the Dept. of the Blindingly Obvious:
If I name the imported fields "Field1", "Field2" etc. any extra fields should slot in in order after those, so extra fields isn't a problem. I'd still prefer to import into the right field names.
Consider using a query to import your CSV data. This query selects the first 4 columns from my CSV file:
That CSV file does not include field names, so Access assigns them as F1 thru F4. And actually, the CSV includes more than 4 columns, but I only wanted the first 4 (I think that corresponds to your situation).
If you build and test a similar query in the query designer, beware it may alter the
FROM
clause to something like this ...... and that won't work. So you will have to check and correct it again each time Access makes its unhelpful change.
Once you get a
SELECT
query which returns what you need, you can transform it to anINSERT
query.However, I'm unsure what will happen with the memo field in your data source. If it doesn't prevent the query from working correctly, at least this approach should allow you to extract the correct fields and stored them in the correct destination fields.