I've written a nice import for my million row CSV that works quite nicely (using OPENROWSET BULK (I didn't use BULK INSERT because I need to cross join with some other columns). The formatfile uses a comma as the Terminator.
Here is an example of the CSV I was using to develop:
Reference, Name, Street
1,Dave Smith, 1 Test Street
2,Sally SMith,1 Test Street
Once I'd got it working, someone reminded me that the data itself could have a comma in it, whoops!!!!
Reference, Name, Street
"1","Dave Smith", "1 Test Street"
"2","Sally Smith","1,Test Street" <-comma in street
How do people deal with a CSV with a comma in the data using FormatFiles? (Or do I just say that the file must be TAB delimited)?
Delimiters within data fields is a common problem with delimited files. Some common tactics to address this include:
Manually editing the data file might be do-able for any of the above options. But it can be tedious, especially for large files. (Just opening a file of several GB's in Notepad.exe is an exercise in patience.) Realistically, you'd want the author to re-create it for you. Option #1 should always "work". But again, there's the data integrity issue you may not be able to live with. Option #2 will probably work for many cases, but it's not bulletproof. Option #3 isn't bulletproof either (it's always possible to have a delimiter within a data field), but it's about as close as you can get. Plus, it preserves data integrity.
Here is one possibility for your XML format file:
Notice the FIELD TERMINATOR: I used single quotes to identify
","
as the terminator and"\r\n
as the row terminator (the COLUMN 3 terminator). I made an educated guess thatName
andStreet
are up to 128 characters--edit that as needed.Problems:
Reference
with a leading"
double quote character. And because of that...Reference
cannot be returned as an INT (or SMALLINT, BIGINT, etc.). It gets returned as a VARCHAR (xsi:type="SQLVARYCHAR"
)For the particular data sample provided, I'd remove the double-quotes from
Reference
data fields, adjust the XML format file so thatFIELD ID="1"
hasTERMINATOR=',"'
, and further adjust the XML format file so thatCOLUMN SOURCE="1"
hasxsi:type="SQLINT"
.For some additional info, this blog post may help: Getting Started With OPENROWSET and the BULK Rowset Provider - Part 2
If your field terminator can occur in the data, then it may be best to use TAB or PIPE delimiter (or whatever works for your data).
http://msdn.microsoft.com/en-us/library/ms191485.aspx