I have string with some special characters. The aim is to retrieve String[] of each line (, separated) You have special character “ where you can have /n and ,
For example Main String
Alpha,Beta,Gama,"23-5-2013,TOM",TOTO,"Julie, KameL
Titi",God," timmy, tomy,tony,
tini".
You can see that there are you /n in "".
Can any Help me to Parse this.
Thanks
__ More Explanation
with the Main Sting I need to separate these
Here Alpha
Beta
Gama
23-5-2013,TOM
TOTO
Julie,KameL,Titi
God
timmy, tomy,tony,tini
Problem is : for Julie,KameL,Titi there is line break /n or
in between KameL and Titi
similar problem for timmy, tomy,tony,tini there is line break /n or
in between tony and tini.
new this text is in file (compulsory line by line reading)
Alpha,Beta Charli,Delta,Delta Echo ,Frank George,Henry
1234-5,"Ida, John
", 25/11/1964, 15/12/1964,"40,000,000.00",0.0975,2,"King, Lincoln
",Mary / New York,123456
12543-01,"Ocean, Peter
output i want to remove this "
Alpha
Beta Charli
Delta
Delta Echo
Frank George
Henry
1234-5
Ida
John
"
25/11/1964
15/12/1964
40,000,000.00
0.0975
2
King
Lincoln
"
Mary / New York
123456
12543-01
Ocean
Peter
See this related answer for a decent Java-compatible regex for parsing CSV.
It recognizes:
""this""
In short, you will use this pattern:
(?:,|\n|^)("(?:(?:"")*[^"]*)*"|[^",\n]*|(?:\n|$))
Then collect each Matcher
group(1)
in afind()
loop.Note: Although I have posted this answer here about a "decent" regex I discovered, just to save people searching for one, it is by no means robust. I still agree with this answer by user "fgv": a CSV Parser is preferrable.
Description
Consider the following powershell example of a universal regex tested on a Java parser which requires no extra processing to reassemble the data parts. The first matching group will match a quote, then carry that to the end of the match so that you're assured to capture the entire value between but not including the quotes. I also don't capture the commas unless they were embedded a quote delimited substring.
(?:^|,\s{0,})(["]?)\s{0,}((?:.|\n|\r)*?)\1(?=[,]\s{0,}|$)
Example
Yields
Summary
(?:
start non capture group^
require start of string|
or,\s{0,}
a comma followed by any number of white space)
close the non capture group(
start capture group 1["]?
consume a quote if it exists, I like doing it this way incase you want to include other characters then a quote)
close capture group 1\s{0,}
consume any spaces if they exist, this means you don't need to trim the value later(
start capture group 2(?:.|\n|\r)*?
capture all characters including a new line, non greedy)
close capture group 2\1
if there was a quote it would be stored in group 1, so if there was one then require it here(?=
start zero assertion look ahead[,]\s{0,}
must have a comma followed by optional whitespace|
or$
end of the string)
close the zero assertion look aheadTry this:
If it matches a string without quotes, the result is returned in group 2. Strings with quotes are returned in group 3. Hence i needed a distinction in the while-block. You might find a prettier way.
Output:
Alpha
Beta
Gama
23-5-2013,TOM
TOTO
Julie, KameLTiti
God
timmy, tomy,tony,tini
.
Parsing CSV is a whole lot harder than one would imagine at first sight, and that's why your best option is to use a well-designed and tested library to do that work for you. Two libraries are opencsv and supercsv, and many others. Have a look at both and use the one that's the best fit to your requirements and style.