How to eliminate ALL line breaks in string?

2019-03-08 10:13发布

I have a need to get rid of all line breaks that appear in my strings (coming from db). I do it using code below:

value.Replace("\r\n", "").Replace("\n", "").Replace("\r", "")

I can see that there's at least one character acting like line ending that survived it. The char code is 8232.

It's very lame of me, but I must say this is the first time I have a pleasure of seeing this char. It's obvious that I can just replace this char directly, but I was thinking about extending my current approach (based on replacing combinations of "\r" and "\n") to something much more solid, so it would not only include the '8232' char but also all others not-found-by-me yet.

Do you have a bullet-proof approach for such a problem?

EDIT#1:

It seems to me that there are several possible solutions:

  1. use Regex.Replace
  2. remove all chars if it's IsSeparator or IsControl
  3. replace with " " if it's IsWhiteSpace
  4. create a list of all possible line endings ( "\r\n", "\r", "\n",LF ,VT, FF, CR, CR+LF, NEL, LS, PS) and just replace them with empty string. It's a lot of replaces.

I would say that the best results will be after applying 1st and 4th approaches but I cannot decide which will be faster. Which one do you think is the most complete one?

EDIT#2

I posted anwer below.

12条回答
我想做一个坏孩纸
2楼-- · 2019-03-08 10:29

Check out this link: http://msdn.microsoft.com/en-us/library/844skk0h.aspx

You wil lhave to play around and build a REGEX expression that works for you. But here's the skeleton...

static void Main(string[] args)
{

        StringBuilder txt = new StringBuilder();
        txt.Append("Hello \n\n\r\t\t");
        txt.Append( Convert.ToChar(8232));

        System.Console.WriteLine("Original: <" + txt.ToString() + ">");

        System.Console.WriteLine("Cleaned: <" + CleanInput(txt.ToString()) + ">");

        System.Console.Read();

    }

    static string CleanInput(string strIn)
    {
        // Replace invalid characters with empty strings.
        return Regex.Replace(strIn, @"[^\w\.@-]", ""); 
    }
查看更多
三岁会撩人
3楼-- · 2019-03-08 10:30

If you've a string say "theString" then use the method Replace and give it the arguments shown below:

theString = theString.Replace(System.Environment.NewLine, "");

查看更多
Fickle 薄情
4楼-- · 2019-03-08 10:31

I'd recommend removing ALL the whitespace (char.IsWhitespace), and replacing it with single space.. IsWhiteSpace takes care of all weird unicode whitespaces.

查看更多
Juvenile、少年°
5楼-- · 2019-03-08 10:35

Have you tried string.Replace(Environment.NewLine, "") ? That usually gets a lot of them for me.

查看更多
Rolldiameter
6楼-- · 2019-03-08 10:35

Assuming that 8232 is unicode, you can do this:

value.Replace("\u2028", string.Empty);
查看更多
劳资没心,怎么记你
7楼-- · 2019-03-08 10:37

8232 (0x2028) and 8233 (0x2029) are the only other ones you might want to eliminate. See the documentation for char.IsSeparator.

查看更多
登录 后发表回答