I have a SSIS package importing data from a .csv file. This file has doulbe quotes ("
) qualifiers for each entry in it but also in between. I also added commas (,
) as a column delimiter. I can't give you the original data I'm working with but here is an example how my data is passed in Flat File Source:
"ID-1","A "B"", C, D, E","Today"
"ID-2","A, B, C, D, E,F","Yesterday"
"ID-3","A and nothing else","Today"
As you can see the second column can contain quotes (and commas) which smashes my SSIS import with an error pointing at this line.
I'm not really familiar with regular expressions, but I've heard that this might help in this case.
In my eyes I need to replace all the double quotes ("
) by single quotes ('
) except...
- ...all quotes at the beginning of one line
- ...all quotes at the end of one line
- ...quotes which are part of
","
Can anyone of you help me out in this thing? Would be great!
Thanks in advance!
To replace double quotes with single quotes according to your specifications, use this simple regex. This regex will allow whitespace at the beginning and/or end of lines.
string pattern = @"(?<!^\s*|,)""(?!,""|\s*$)";
string resultString = Regex.Replace(subjectString, pattern, "'", RegexOptions.Multiline);
This is the explanation of the pattern:
// (?<!^\s*|,)"(?!,"|\s*$)
//
// Options: ^ and $ match at line breaks
//
// Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind) «(?<!^\s*|,)»
// Match either the regular expression below (attempting the next alternative only if this one fails) «^\s*»
// Assert position at the beginning of a line (at beginning of the string or after a line break character) «^»
// Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) «\s*»
// Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
// Or match regular expression number 2 below (the entire group fails if this one fails to match) «,»
// Match the character “,” literally «,»
// Match the character “"” literally «"»
// Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?!,"|\s*$)»
// Match either the regular expression below (attempting the next alternative only if this one fails) «,"»
// Match the characters “,"” literally «,"»
// Or match regular expression number 2 below (the entire group fails if this one fails to match) «\s*$»
// Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) «\s*»
// Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
// Assert position at the end of a line (at the end of the string or before a line break character) «$»
You can split columns with regex match pattern
/(?:(?<=^")|(?<=",")).*?(?:(?="\s*$)|(?=","))/g
See this demo.
while loading CSV with double quotes and comma there is one limitation that extra double quotes has been added and the data also enclosed with the double quotes you can check in the preview of source file.
So, add the derived column task and give the below expression:-
(REPLACE(REPLACE(RIGHT(SUBSTRING(TRIM(COL2),1,LEN(COL2) - 1),LEN(COL2) - 2)," ","@"),"\"\"","\""),"@"," ")
the bold part removes the data enclosed with double quotes.
Try this and do let me know if this is helpful
Use text qualifier "
for CSV destination
before inserting values to CSV destination, add a derived column expression
REPLACE(REPLACE([Column1],",",""),"\"","")
This will retain "
in your text field