Bulk Insert with optional text qualifier

2019-07-29 23:54发布

问题:

I am importing csv to db using bulk insert. It is the comma delimited csv file. No text qualifiers for all fields.

But some fields may have comma as part of the data. for eg, ADDRESS field value. Those values are surronded with double quotes. Those double quotes appear only if the field value has comma in it otherwise values are not surronded with double quotes. So in some rows ADDRESS values are surronded with double-quotes, but in other rows they are not. Is there a way to specify the text-qualifier in the bulk insert command?

I tried bulk insert with format file option.

BULK INSERT Test_Imported FROM 'C:\test.csv' 
WITH (FIRSTROW=0,FIELDTERMINATOR = ',',ROWTERMINATOR = '\n',FORMATFILE = 'C:\test.Fmt')

but there is no way i can mention the double quotes as optional text qualifiers in the format file.

PS: this function is actually a part of the bigger module, which is written in c#. bulk insert command is called from c#.

The csv file is coming by email from another automated system. i have no control over the format of the csv file.There are around 150 columns. In average 12000 rows are coming in each csv file. Forgot to spcify the DB. It is SQL server 2005.

回答1:

Unfortunately, you'll have to pre-process the file to make it consistent. SQL bulk operations split the string on the field delimiter.

Some options:

  • Process in c# to change commas not surrounded by quotes to pipe (|)
  • Break the file in 2: " and non-" files. This works only if the same field has "

You say you have no control over the format, but what you have is unusable...



回答2:

the Bulk Insert statement really sucks because it doesn't handle optional qualifiers.

The TextFieldParser class can help us clean up the file (Microsoft.VisualBasic.FileIO.TextFieldParser)

I have pasted in a function that uses the TextFieldParser class to clean up a delimited file so you can use it in a Bulk Insert statement.

String newDel = CleanDelimitedFile("c:\temp.csv",new String[] {","},"\t,\t");

Here is a function that will clean up your Delimited file.

    /// <summary>
    /// This function opens a delimited file and cleans up any string quantifiers
    /// </summary>
    /// <param name="FileFullPath">Full path of the delimited string</param>
    /// <param name="CurrentDelimiter">What string / character the file uses as the delimiter</param>
    /// <param name="NewDelimiter">What new delimiter string to use</param>
    /// <returns>Returns String representation of the new delimited file</returns>
    private static String CleanDelimitedFile(String FileFullPath, String[] CurrentDelimiter, String NewDelimiter) {

        //-- if the file exists stream it to host
        if (System.IO.File.Exists( FileFullPath )) {
            Microsoft.VisualBasic.FileIO.TextFieldParser cvsParser = null;
            System.Text.StringBuilder parseResults = new System.Text.StringBuilder();
            try {
                // new parser
                cvsParser = new Microsoft.VisualBasic.FileIO.TextFieldParser(FileFullPath);
                // delimited file has certain fields enclosed in quotes
                cvsParser.HasFieldsEnclosedInQuotes = true;
                // the current delimiter
                cvsParser.Delimiters = CurrentDelimiter;
                // iterate through all the lines of the file
                Boolean FirstLine = true;
                while (!cvsParser.EndOfData ) {
                    if (FirstLine) {
                        FirstLine = false;
                    }
                    else {
                      parseResults.Append("\n");  
                    }
                    Boolean FirstField = true;
                    // iterate through each field
                    foreach (String item in cvsParser.ReadFields()) {
                        if (FirstField) {
                            parseResults.Append(item);
                            FirstField = false;
                        } 
                        else {
                            parseResults.Append(NewDelimiter + item);
                        }
                    }

                }
                return parseResults.ToString();
            }
            finally {
                if (cvsParser != null) {
                    cvsParser.Close();
                    cvsParser.Dispose();
                }
            }
        }
        return String.Empty;
    }


回答3:

Sadly, SQL 2005 and 2008 import XLS files much more smoothly than CSV files. I've never been anti-Microsoft but unless all the ANSI standards of database management are dramatically changing and the concept of a text qualifier is being abandoned (which I highly doubt), then this is probably a proprietary move by MS. SQL 2000 handled text qualifiers just fine (not sure about the BULK command as I've always just used the Import Wizards). Imagine my surprise when we migrated to 2005 and I had to rework all of my processes to NOT import flat files but instead import XLS. It only took me 16 hours (yes, TWO work days) to come to that conclusion and I actually lost sleep that week because I was so frustrated with MS for not allowing the use of Text Qualifiers (I even went into my bosses office to apologize for spending so much time on what should have been a 10 minute task). Ironically, you can't tell Excel to export anything withOUT including a double-quoted text-qualifier (or virtually any other software exporters for that matter). GRRRRRR.

The most frustrating part of all of this is that the SQL 2005 import wizard has a place to define the text qualifer!

...dare I say I'm starting to understand all the anti-M$ rhetoric after this experience!



回答4:

Really a very good explained article with

  1. How to create a format file

  2. step by step explained the meaning of every columns

  3. SQL Version, how to use.

See this link Bulk insert with text qualifier in sql server



回答5:

I know this is an old question, but I have a TSQL method for dealing with intermittent quote delimiters. It may not be pretty, but it may help someone who finds there way here:

  1. Import the text file with each line in a single column - one field.
  2. Use the update statement below to change the commas that are between quotation marks into some identifiable string, in this case *&*
  3. Use another update statement to strip all quotation marks.
  4. Use bcp to export the data into a new CSV file.
  5. Perform your bulk import into the original table with all the fields from the new CSV file: now there are no quotation marks and the in-field commas are & instead, so a simple comma-delimited import will work.
  6. Use another update statement to change & back into a comma.

UPDATE InitialTable SET BulkColumn = REPLACE(BulkColumn, SubString(BulkColumn, CHARINDEX('"', BulkColumn, 0), CHARINDEX('"', BulkColumn, CHARINDEX('"', BulkColumn, 0) + 1) - CHARINDEX('"', BulkColumn, 0) + 1), REPLACE( SubString(BulkColumn, CHARINDEX('"', BulkColumn, 0), CHARINDEX('"', BulkColumn, CHARINDEX('"', BulkColumn, 0) + 1) - CHARINDEX('"', BulkColumn, 0) + 1), ',', '*&*')) WHERE BulkColumn LIKE '%"%'



回答6:

What worked for me was changing

ROWTERMINATOR = '\n'

To:

ROWTERMINATOR = '0x0a'