Is “\n” a vertical whitespace, i.e., should “\v” m

2020-02-06 02:05发布

Logically, it is (but logic is irrelevant whenever character encodings or locales are in play). According to

perl -e 'print "\n" =~ /\v/ ? "y\n" : "n\n";'

printing "y", it is. According to

Pattern.compile("\\v").matcher("\n").matches();

returning false in java, it's not. This wouldn't confuse me at all, if there weren't this posting claiming that

Sun’s updated Pattern class for JDK7 has a marvelous new flag, UNICODE_CHARACTER_CLASS, which makes everything work right again.

But I'm using java version "1.7.0_07" and the flag exists and seems to change nothing at all. Moreover, "\n" is no newcomer to Unicode but a plain old ASCII character, so I really don't see how this difference may happen. Probably I'm doing something stupid, but I can't see it.

2条回答
倾城 Initia
2楼-- · 2020-02-06 02:25

perldoc perlrecharclass says that \v matches a "vertical whitespace character". This is further explained:

"\v" matches any character considered vertical whitespace; this includes the platform's carriage return and line feed characters (newline) plus several other characters, all listed in the table below. "\V" matches any character not considered vertical whitespace. They use the platform's native character set, and do not consider any locale that may otherwise be in use.

Specifically, \v matches the following characters in 5.16:

$ unichars -au '\v'           # From Unicode::Tussle
 ---- U+0000A LINE FEED
 ---- U+0000B LINE TABULATION
 ---- U+0000C FORM FEED
 ---- U+0000D CARRIAGE RETURN
 ---- U+00085 NEXT LINE
 ---- U+02028 LINE SEPARATOR
 ---- U+02029 PARAGRAPH SEPARATOR

You could use a character class to get the same effect as Perl's \v.

Of course this applies to Perl; I don't know whether it applies to Java.

查看更多
叛逆
3楼-- · 2020-02-06 02:33

The Javadoc for java.util.regex.Pattern explicitly mentions \v in its "list of Perl constructs not supported by this class". So it's not that \n doesn't belong to Java's category of "vertical whitespace"; it's that Java doesn't have a category of "vertical whitespace".

Edited to add: Instead, \v stands for the vertical tab character, U+000B. This is a traditional escape sequence; there are also a few other traditional escape sequences that aren't allowed in Java string literals but that are supported by Pattern (\a for alert/bell, \cX for control-character X). Oddly, however, the Javadoc for Pattern fails to mention that it supports \v; so I'm not sure if it can be expected to be supported in all JDK implementations.

查看更多
登录 后发表回答