Regular Expression to match word repeated twice (i

2019-08-08 13:28发布

I have a java regular expression given by my CS2 instructor that checks if a word is repeated:

\\b(\\w+)\\s+\\1\\b

How can I modify this to check if a word is repeated twice as in "hello hello hello" or "hello world hello hello"

If possible, I'd just like to be pointed in the right direction, not an outright solution (after all, I need to learn this). I think my problem is that I don't understand word boundaries well.

2条回答
霸刀☆藐视天下
2楼-- · 2019-08-08 13:43

Well, since you seem to want to learn this yourself I'll give you a helpful Oracle Link. And secondly I'll suggest you pay attention to what exactly you're trying to achieve, a pattern with three of the same word. Hope that helps and isn't too obvious. Comment if you need more help.

Edit: Sorry I forgot the second link here. This page is also helpful.

查看更多
混吃等死
3楼-- · 2019-08-08 13:47

First, you need to figure out the anatomy of the expression that you are given. It describes a string that captures a non-empty sequence of word characters (\\w+) that begins at a word boundary, which is followed by a non-empty sequence of spaces \\s+, followed by the content of the captured group, which is not part of a longer word (that is what the \\b does at the end of the expression).

Next, you need to build a regular expression that describes "a possibly empty sequence of word characters and spaces". That would be (?:\\w|\\s)*.

Now you are ready to make your expression. You need these parts:

  • A capture group that matches a sequence of word characters that begins and ends at a word boundary
  • A possibly empty sequence of word characters and spaces that ends at a word boundary
  • The value of your captured sequence that ends at a word boundary
  • Another possibly empty sequence of word characters and spaces that ends at a word boundary
  • The value of your captured sequence that ends at a word boundary
查看更多
登录 后发表回答