Character class subtraction, converting from Java

2019-01-09 12:21发布

Which regular expression engine does Java uses?

In a tool like RegexBuddy if I use

[a-z&&[^bc]]

that expression in Java is good but in RegexBuddy it has not been understood.

In fact it reports:

Match a single character present in the list below [a-z&&[^bc]

  • A character in the range between a and z : a-z
  • One of the characters &[^bc : &&[^bc
  • Match the character ] literally : ]

but i want to match a character between a and z intersected with a character that is not b or c

3条回答
放荡不羁爱自由
2楼-- · 2019-01-09 13:11

Java uses its own regular expression engine, which behaviour is defined in the Pattern class.

You can test it with an Eclipse plugin or online.

查看更多
唯我独甜
3楼-- · 2019-01-09 13:14

RegexBuddy does not yet support the character class union, intersection, and subtraction syntax that is unique to the Java regular expression flavor. This is the only part of the Java regex syntax that RegexBuddy does not yet support. We're planning to implement this in a future version of RegexBuddy. The reason this has been postponed is because no other regular expression flavor supports this syntax.

P.S.: If you have a question about RegexBuddy in particular, please add the "regexbuddy" tag to your question. Then the question automatically shows up in my RSS reader. I don't follow the "regex" tag because far too many questions use that tag, and most are already answered by the time I see them.

查看更多
欢心
4楼-- · 2019-01-09 13:17

Like most regex flavors, java.util.regex.Pattern has its own specific features with syntax that may not be fully compatible with others; this includes character class union, intersection and subtraction:

  • [a-d[m-p]] : a through d, or m through p: [a-dm-p] (union)
  • [a-z&&[def]] : d, e, or f (intersection)
  • [a-z&&[^bc]] : a through z, except for b and c: [ad-z] (subtraction)

The most important "caveat" of Java regex is that matches attempts to match a pattern against the whole string. This is atypical of most engines, and can be a source of confusion at times.

See also


On character class subtraction

Subtraction allows you to define for example "all consonants" in Java as [a-z&&[^aeiou]].

This syntax is specific to Java. In XML Schema, .NET, JGSoft and RegexBuddy, it's [a-z-[aeiou]]. Other flavors may not support this feature at all.

References

Related questions

查看更多
登录 后发表回答