I have a need to get rid of all line breaks that appear in my strings (coming from db). I do it using code below:
value.Replace("\r\n", "").Replace("\n", "").Replace("\r", "")
I can see that there's at least one character acting like line ending that survived it. The char code is 8232.
It's very lame of me, but I must say this is the first time I have a pleasure of seeing this char. It's obvious that I can just replace this char directly, but I was thinking about extending my current approach (based on replacing combinations of "\r" and "\n") to something much more solid, so it would not only include the '8232' char but also all others not-found-by-me yet.
Do you have a bullet-proof approach for such a problem?
EDIT#1:
It seems to me that there are several possible solutions:
- use Regex.Replace
- remove all chars if it's IsSeparator or IsControl
- replace with " " if it's IsWhiteSpace
- create a list of all possible line endings ( "\r\n", "\r", "\n",LF ,VT, FF, CR, CR+LF, NEL, LS, PS) and just replace them with empty string. It's a lot of replaces.
I would say that the best results will be after applying 1st and 4th approaches but I cannot decide which will be faster. Which one do you think is the most complete one?
EDIT#2
I posted anwer below.
Here are some quick solutions with .NET regex:
s = Regex.Replace(s, @"\s+", "");
(\s
matches any Unicode whitespace chars)s = Regex.Replace(s, @"[\s-[\r\n]]+", "");
([\s-[\r\n]]
is a character class containing a subtraction construct, it matches any whitespace but CR and LF)\p{Zs}
(any horizontal whitespace but tab) and\t
(tab) from\s
:s = Regex.Replace(s, @"[\s-[\p{Zs}\t]]+", "");
.Wrapping the last one into an extension method:
See the regex demo.
Below is the extension method solving my problem. LineSeparator and ParagraphEnding can be of course defined somewhere else, as static values etc.
According to wikipedia, there are numerous line terminators you may need to handle (including this one you mention).
personally i'd go with
Props to Yossarian on this one, I think he's right. Replace all whitespace with a single space:
This is my first attempt at this, but I think this will do what you want....
Also, see this link for details on other methods you can use: Char Methods