Regular Expression For Consecutive Duplicate Words

I'm a regular expression newbie, and I can't quite figure out how to write a single regular expression that would "match" any duplicate consecutive words such as:

Paris in the the spring.

Not that that is related.

Why are you laughing? Are my my regular expressions THAT bad??

Is there a single regular expression that will match ALL of the bold strings above?

标签： regex duplicates capture-group

12条回答

有味是清欢

2楼-- · 2019-01-01 05:57

Try this with below RE

\b start of word word boundary
\W+ any word character
\1 same word matched already
\b end of word

()* Repeating again

public static void main(String[] args) {

    String regex = "\\b(\\w+)(\\b\\W+\\b\\1\\b)*";//  "/* Write a RegEx matching repeated words here. */";
    Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE/* Insert the correct Pattern flag here.*/);

    Scanner in = new Scanner(System.in);

    int numSentences = Integer.parseInt(in.nextLine());

    while (numSentences-- > 0) {
        String input = in.nextLine();

        Matcher m = p.matcher(input);

        // Check for subsequences of input that match the compiled pattern
        while (m.find()) {
            input = input.replaceAll(m.group(0),m.group(1));
        }

        // Prints the modified sentence.
        System.out.println(input);
    }

    in.close();
}

0人赞添加讨论(0) 举报

低头抚发

3楼-- · 2019-01-01 05:59

Try this regular expression:

\b(\w+)\s+\1\b

Here \b is a word boundary and \1 references the captured match of the first group.

0人赞添加讨论(0) 举报

听够珍惜

4楼-- · 2019-01-01 05:59

This is the regex I use to remove duplicate phrases in my twitch bot:

(\S+\s*)\1{2,}

(\S+\s*) looks for any string of characters that isn't whitespace, followed whitespace.

\1{2,} then looks for more than 2 instances of that phrase in the string to match. If there are 3 phrases that are identical, it matches.

0人赞添加讨论(0) 举报

无与为乐者.

5楼-- · 2019-01-01 06:05

Regex to Strip 2+ duplicate words (consecutive/non-consecutive words)

Try this regex that can catch 2 or more duplicates words and only leave behind one single word. And the duplicate words need not even be consecutive.

/\b(\w+)\b(?=.*?\b\1\b)/ig

Here, \b is used for Word Boundary, ?= is used for positive lookahead, and \1 is used for back-referencing.

Example Source

0人赞添加讨论(0) 举报

余生请多指教

6楼-- · 2019-01-01 06:09

The widely-used PCRE library can handle such situations (you won't achieve the the same with POSIX-compliant regex engines, though):

(\b\w+\b)\W+\1

0人赞添加讨论(0) 举报

心情的温度

7楼-- · 2019-01-01 06:11

This expression (inspired from Mike, above) seems to catch all duplicates, triplicates, etc, including the ones at the end of the string, which most of the others don't:

/(^|\s+)(\S+)(($|\s+)\2)+/g, "$1$2")

I know the question asked to match duplicates only, but a triplicate is just 2 duplicates next to each other :)

First, I put (^|\s+) to make sure it starts with a full word, otherwise "child's steak" would go to "child'steak" (the "s"'s would match). Then, it matches all full words ((\b\S+\b)), followed by an end of string ($) or a number of spaces (\s+), the whole repeated more than once.

I tried it like this and it worked well:

var s = "here here here     here is ahi-ahi ahi-ahi ahi-ahi joe's joe's joe's joe's joe's the result result     result";
print( s.replace( /(\b\S+\b)(($|\s+)\1)+/g, "$1"))         
--> here is ahi-ahi joe's the result

0人赞添加讨论(0) 举报

1 2 下一页

Regular Expression For Consecutive Duplicate Words

Regex to Strip 2+ duplicate words (consecutive/non-consecutive words)

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间