Regular expression to match any vertical whitespac

2019-02-25 16:43发布

问题:

Is there a regex pattern for .NET that will match any character that will result in multiple lines, i.e. any vertical whitespace character, like perl regex does with \v? In other words, is there a way to match \r (carriage return), \n (line feed), \v (vertical tab), and \f (form feed) as well as the Unicode characters U+0085 (next line), U+2028 (line separator), and U+2029 (paragraph separator) and any other characters I'm not aware of that might result in more than one line?

I'm writing some validation code in .NET that will fail if a user has provided input text that contains more than one line. In most cases, that means I just have to check for \r and \n. However, I know there is a multitude of other vertical whitespace characters.

I know .NET regex differs from perl regex, most importantly in that \v in .NET matches "vertical tab" whereas it matches "vertical whitespace" in perl regex.

回答1:

As you say, the Perl character class \v matches [\x0A-\x0D] (linefeed, vertical tab, form feed and carriage-return (although I would dispute that CR is vertical white space)) in addition to the non-ASCII code points [\x{2028}\x{2029}] (line separator and paragraph separator).

You can hand-build this character class in .NET like this

[\u0A-\u0D\u2028\u2029]


回答2:

If one wants to match any unknowns simply us the not set [^ ] (at least in .Net, my perl is a little hazy) to match up to a specific character. For example if I wanted to match whitespace between from a current position across a line to the next line which starts with the letter D I would use this

([^D]+)

So the match capture would include every type of whitespace there is up to the letter D.