Using Excel to create a CSV file with special char

2019-03-06 00:43发布

问题:

Take this XLS file

I then save this XLS file as CSV and then open it up with a text editor. This is what I see:

Col1,Col2,Col3,Col4,Col5,Col6,Col7
1,ABC,"AB""C","D,E",F,03,"3,2"

I see that the double quote character in column C was stored as AB""C, the column value was enclosed with quotations and the double quote character in the data was replaced with 2 double quote characters to indicate that the quote is occurring within the data and not terminating the column value. I also see that the value for column G, 3,2, is enclosed in quotes so that it is clear that the comma occurs within the data rather than indicating a new column. So far, so good.

I am a little surprised that all of the column values are not enclosed by quotes but even this seems reasonable OK when I assume that EXCEL only specifies column delimieters when special characters like a commad or a dbl quote character exists in the data.

Now I try to use SQL Server to import the csv file. Note that I specify a double quote character as the Text Qualifier character.

And a command char as the Column delimiter character. However, note that SSIS imports column 3 incorrectly,eg, not translating the two consecutive double quote characters as a single occurence of a double quote character.

What do I have to do to get Excel and SSIS to get along?

Generally people avoid the issue by using column delimiter chactacters that are LESS LIKELY to occur in the data but this is not a real solution.

I find that if I modify the file from this

Col1,Col2,Col3,Col4,Col5,Col6,Col7
1,ABC,"AB""C","D,E",F,03,"3,2"

...to this:

Col1,Col2,Col3,Col4,Col5,Col6,Col7
1,ABC,"AB"C","D,E",F,03,"3,2"

i.e, removing the two consecutive quotes in column C's value, that the data is loaded properly, however, this is a little confusing to me. First of all, how does SSIS determine that the double quote between the B and the C is not terminating that column value? Is it because the following characters are not a comma column delimiter or a row delimiter (CRLF)? And why does Excel export it this way?

According to Wikipedia, here are a couple of traits of a CSV file:

  1. Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes. For example:

    "aaa","b CRLF bb","ccc" CRLF zzz,yyy,xxx

  2. If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. For example:

    "aaa","b""bb","ccc"

However, it looks like SSIS doesn't like it that way when importing. What can be done to get Excel to create a CSV file that could contain ANY special characters used as column delimiters, text delimiters or row delimiters in the data? There's no reason that it can't work using the approach specified in Wikipedia,. which is what I thought the old MS DTS packages used to do...

Update:

If I use Notepad change the input file to

Col1,Col2,Col3,Col4,Col5,Col6,Col7,Col8
"1","ABC","AB""C","D,E","F","03","3,2","AB""C"

Excel reads it just fine

but SSIS returns

The preview sample contains embedded text qualifiers ("). The flat file parser does not support embedding text qualifiers in data. Parsing columns that contain data with text qualifiers will fail at run time.

回答1:

Conclusion:

Just like the error message says in your update...

The flat file parser does not support embedding text qualifiers in data. Parsing columns that contain data with text qualifiers will fail at run time.

Confirmed bug in Microsoft Connect. I encourage everyone reading this to click on this aforementioned link and place your vote to have them fix this stinker. This is in the top 10 of the most egregious bugs I have encountered.



回答2:

Do you need to use a comma delimiter.

I used a pipe delimiter with no Text qualifier and it worked fine. Here is my output form the text file.

1|ABC|AB"C|D,E|F|03|3,2

You have 3 options in my opinion.

  1. Read the data into a stage table.
  2. Run any update queries you need on the columns
  3. Now select your data from the stage table and output it to a flat file.

OR

  1. Use pipes are you delimiters.

OR

  1. Do all of this in a C# application and build it in code.
  2. You could send the row to a script in SSIS and parse and build the file you want there as well.

Using text qualifiers and "character" delimited fields is problematic for sure.

Have Fun!