Regular Expression For Consecutive Duplicate Words-第2页回答

I'm a regular expression newbie, and I can't quite figure out how to write a single regular expression that would "match" any duplicate consecutive words such as:

Paris in the the spring.

Not that that is related.

Why are you laughing? Are my my regular expressions THAT bad??

Is there a single regular expression that will match ALL of the bold strings above?

标签： regex duplicates capture-group

12条回答

闭嘴吧你

2楼-- · 2019-01-01 06:14

Since some developers are coming to this page in search of a solution which not only eliminates duplicate consecutive non-whitespace substrings, but triplicates and beyond, I'll show the adapted pattern.

Pattern: /(\b\S+)(?:\s+\1\b)+/ (Pattern Demo)
Replace: $1 (replaces the fullstring match with capture group #1)

This pattern greedily matches a "whole" non-whitespace substring, then requires one or more copies of the matched substring which may be delimited by one or more whitespace characters (space, tab, newline, etc).

Specifically:

\b (word boundary) characters are vital to ensure partial words are not matched.
The second parenthetical is a non-capturing group, because this variable width substring does not need to be captured -- only matched/absorbed.
the + (one or more quantifier) on the non-capturing group is more appropriate than * because * will "bother" the regex engine to capture and replace singleton occurrences -- this is wasteful pattern design.

*note if you are dealing with sentences or input strings with punctuation, then the pattern will need to be further refined.

0人赞添加讨论(0) 举报

唯独是你

3楼-- · 2019-01-01 06:15

The example in Javascript: The Good Parts can be adapted to do this:

var doubled_words = /([A-Za-z\u00C0-\u1FFF\u2800-\uFFFD]+)\s+\1(?:\s|$)/gi;

\b uses \w for word boundaries, where \w is equivalent to [0-9A-Z_a-z]. If you don't mind that limitation, the accepted answer is fine.

0人赞添加讨论(0) 举报

伤终究还是伤i

4楼-- · 2019-01-01 06:16

Here is one that catches multiple words multiple times:

(\b\w+\b)(\s+\1)+

0人赞添加讨论(0) 举报

浮光初槿花落

5楼-- · 2019-01-01 06:21

I believe this regex handles more situations:

/(\b\S+\b)\s+\b\1\b/

A good selection of test strings can be found here: http://callumacrae.github.com/regex-tuesday/challenge1.html

0人赞添加讨论(0) 举报

心情的温度

6楼-- · 2019-01-01 06:21

Use this in case you want case-insensitive checking for duplicate words.

(?i)\\b(\\w+)\\s+\\1\\b

0人赞添加讨论(0) 举报

倾城一夜雪

7楼-- · 2019-01-01 06:24

No. That is an irregular grammar. There may be engine-/language-specific regular expressions that you can use, but there is no universal regular expression that can do that.

0人赞添加讨论(0) 举报

上一页 1 2

Regular Expression For Consecutive Duplicate Words

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间