I'm a regular expression newbie, and I can't quite figure out how to write a single regular expression that would "match" any duplicate consecutive words such as:
Paris in the the spring.
Not that that is related.
Why are you laughing? Are my my regular expressions THAT bad??
Is there a single regular expression that will match ALL of the bold strings above?
Try this with below RE
()* Repeating again
Try this regular expression:
Here
\b
is a word boundary and\1
references the captured match of the first group.This is the regex I use to remove duplicate phrases in my twitch bot:
(\S+\s*)
looks for any string of characters that isn't whitespace, followed whitespace.\1{2,}
then looks for more than 2 instances of that phrase in the string to match. If there are 3 phrases that are identical, it matches.Regex to Strip 2+ duplicate words (consecutive/non-consecutive words)
Try this regex that can catch 2 or more duplicates words and only leave behind one single word. And the duplicate words need not even be consecutive.
Here,
\b
is used for Word Boundary,?=
is used for positive lookahead, and\1
is used for back-referencing.Example Source
The widely-used PCRE library can handle such situations (you won't achieve the the same with POSIX-compliant regex engines, though):
This expression (inspired from Mike, above) seems to catch all duplicates, triplicates, etc, including the ones at the end of the string, which most of the others don't:
I know the question asked to match duplicates only, but a triplicate is just 2 duplicates next to each other :)
First, I put
(^|\s+)
to make sure it starts with a full word, otherwise "child's steak" would go to "child'steak" (the "s"'s would match). Then, it matches all full words ((\b\S+\b)
), followed by an end of string ($
) or a number of spaces (\s+
), the whole repeated more than once.I tried it like this and it worked well: