Splitting a sentence

2019-09-09 20:39发布

I'm trying to split a string: multiple characters such as !!! , ??, ... denote the end of the sentence so I want anything after this to be on a new line e.g. sentence hey.. hello split !!! example me. should be turned into:

hey..
hello split !!!
example me.

What I tried:

String myStr= "hey.. hello split !!! example me.";
String [] split = myStr.split("(?<=\\.{2,})");

This works fine when I have multiple dots but doesn't work for anything else, I can't add exclamation marks to this expression too "(?<=[\\.{2,}!{2,}]). This splits after each dot and exclamation. Is there any way to combine those ? Ideally I wanted the app to split after a SINGLE dot too (anything that denotes the end of the sentence) but I don't think this is possible in a single pass...Thanks

标签: java regex
3条回答
兄弟一词,经得起流年.
2楼-- · 2019-09-09 21:16

Just do like this,

String [] split = myStr.split("(?<=([?!.])\\1+)");

oir

String [] split = myStr.split("(?<=([?!.])\\1{1,99})");

It captures the first character from the list [?.!] and expects the same character to be present one or more times. If yes, then the splitting should occur next to this.

or

String[] split = s.split("(?<=\\.{2,}+)|(?<=\\?{2,}+)|(?<=!{2,}+)");

Ideone

查看更多
虎瘦雄心在
3楼-- · 2019-09-09 21:17

Ideally I wanted the app to split after a SINGLE dot too (anything that denotes the end of the sentence)

To do this first you have to lay down as to what cases are you considering as end of sentence. Multiple special symbols are not standard form of ending a sentence (as per my knowledge).

But if you are keeping in mind the nefarious users or some casual mistakes ending up making special symbols look like end of sentence then at least make a list of such cases and then proceed.

For your situation here where you want to split the string on multiple special symbols. Lookbehind won't be of much help because as Wiktor noted

The problem is in the backreference whose length is not known from the start.

So we need to find that zero-width where splitting needs to be done. And following regex does the same.

Regex:

Note the space between two assertions in second regex.If you want to consume the preceding space when start next line.

Explanation:

  • This will split on the zero-width where it's preceded by special and not succeeded by it.

hey..¦ hello split !!!¦ example me. ( ¦ denotes the zero-width)

查看更多
放我归山
4楼-- · 2019-09-09 21:26

A look behind, with a negative look to prevent split within the group:

String[] lines = s.split("(?<=[?!.]{2,3})(?![?!.])");

Some test code:

public static void main (String[] args) {
    String s = "hey..hello split !!!example me.";
    String[] lines = s.split("(?<=[?!.]{2,3})(?![?!.])");
    Arrays.stream(lines).forEach(System.out::println);
}

Output:

hey..
hello split !!!
example me.
查看更多
登录 后发表回答