Bulk insert from CSV file - skip duplicates

UPDATE: Ended up using this method created by Johnny Bubriski and then modified it a bit to skip duplicates. Works like a charm and is apparently quite fast. Link: http://johnnycode.com/2013/08/19/using-c-sharp-sqlbulkcopy-to-import-csv-data-sql-server/

I have been searching for an answer to this but cannot seem to find it. I am doing a T-SQL bulk insert to load data into a table in a local database from a csv file. My statement looks like this:

BULK INSERT Orders
FROM 'csvfile.csv'
WITH(FIELDTERMINATOR = ';', ROWTERMINATOR = '0x0a', FORMATFILE = 'formatfile.fmt', ERRORFILE = 'C:\\ProgramData\\Tools_TextileMagazine\\AdditionalFiles\\BulkInsertErrors.txt')
GO

SELECT * 
FROM Orders
GO

I get an exception when I try to insert duplicate rows (for example taking the same csv file twice) which causes the entire insert to stop and rollback. Understandable enough since I am violating the primary key constraint. Right now I just show a messagebox to let users know that duplicates are present in the csv file, but this is of course not a proper solution, actually not a solution at all. My question is, is there any way to ignore these duplicate rows and just skip them and only add the rows that are not duplicate? Perhaps in a try catch somehow?

If it is not possible, what would be the "correct" (for lack of a better word) way to import data from the csv file? The exception is causing me a bit of trouble. I did read somewhere that you can set up a temporary table, load the data into it and select distinct between the two tables before inserting. But is there really no easier way to do it with bulk insert?

标签： sql sql-server csv bulkinsert sql-server-2014

1条回答

对你真心纯属浪费

2楼-- · 2019-05-13 23:18

You could set the MAXERRORS property to quite a high which will allow the valid records to be inserted and the duplicates to be ignored. Unfortunately, this will mean that any other errors in the dataset won't cause the load to fail.

Alternatively, you could set the BATCHSIZE property which will load the data in multiple transactions therefore if there are duplicates it will only roll back the batch.

A safer, but less efficient, way would be to load the CSV file in to a separate, empty, table and then merge them into your orders table as you mentioned. Personally, this is the way I'd do it.

None of these solutions are ideal but I can't think of a way of ignoring duplicates in the bulk insert syntax.

0人赞添加讨论(0) 举报

Bulk insert from CSV file - skip duplicates

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间