How to modify data in .csv during BULK INSERT?

2019-07-14 11:25发布

问题:

I'm trying to convert a web application I built using MySQL into Microsoft SQL and need some guidance. I've got a variety of different sources of CSV data and I was using a LOAD DATA LOCAL INFILE to modify the contents (e.g. change case to uppercase, remove whitespace, concatenate several fields into one, etc), add some data (the account number and current date/time), ignore some data (assign to a dummy variable and never use), and put the data into my database in the correct columns. Can I achieve the same result using MSSQL?

Here's an example import code snippet from the MySQL version:

LOAD DATA LOCAL INFILE 'testDataFile.csv'
INTO TABLE tbl_raw_data
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(
@order_date,
@order_number,
@site_name,
@patient_name_last,
@patient_name_first,
@dummy,
@dummy, 
@dummy,
@medication_ndc_prefix,
@dummy,
@dummy,
@patient_ID_number,
@prescriber_ID, 
@order_retail,  
@insurance_ID,
@dummy,
@order_reimbursement,
@dummy,
@dummy,
@dummy, 
@dummy,
@dummy,
@order_acquisition_cost,    
@dummy,
@dummy,
@dummy
)
SET
order_number = UPPER(TRIM(@order_number)),
site_name = UPPER(TRIM(@site_name)),
patient_name_last = UPPER(TRIM(@patient_name_last)),
patient_name_first = UPPER(TRIM(@patient_name_first)),
patient_ID_number = UPPER(TRIM(@patient_ID_number)),
prescriber_ID = UPPER(TRIM(@prescriber_ID)),
insurance_ID = UPPER(TRIM(@insurance_ID)),
order_date = str_to_date(@order_date, '%m/%d/%Y'),
order_retail = REPLACE(@order_retail,'$',''),
order_reimbursement = REPLACE(@order_reimbursement,'$',''),
order_acquisition_cost = REPLACE(@order_acquisition_cost,'$',''),
medication_ndc_prefix = LEFT(REPLACE(@medication_ndc_prefix, '-', ''),9),
patient_ID = CONCAT(TRIM(patient_name_last),',',trim(patient_name_first),'-',patient_ID_number),
order_added_on = CURRENT_TIMESTAMP,
account_ID = 1

The web interface is built with PHP if that matters.

回答1:

Use OPENROWSET

INSERT INTO dbo.YourTable
SELECT a.* FROM OPENROWSET( BULK 'D:\our.csv', FORMATFILE = 'D:\our.fmt') AS a;

The sample of our.fmt (it's file that describes the fields in csv)

9.0
4
1  SQLCHAR  0  50 ";"        1  Field1                SQL_Latin1_General_Cp437_BIN
2  SQLCHAR  0  50 ";"        2  Field2                SQL_Latin1_General_Cp437_BIN
3  SQLCHAR  0  50 ";"        3  Field3                SQL_Latin1_General_Cp437_BIN
4  SQLCHAR  0  500 "\r\n"      4  Field4        SQL_Latin1_General_Cp437_BIN

You can find description of *.fmt here.