How to make regex match only first occurrence of e

2020-02-14 05:43发布

/\b(keyword|whatever)\b/gi

How can I modify the above javascript regex to match only the first occurance of each word (I believe this is called non-greedy)?

First occurance of "keyword" and first occurance of "whatever" and I may put more more words in there.

4条回答
家丑人穷心不美
2楼-- · 2020-02-14 06:11

What you're doing is simply unachievable with a singular regular expression. Instead you will have to store every word you wish to find in an array, loop through them all searching for an answer, and then for any matches, store the result in an array.

Example:

var words = ["keyword","whatever"];
var text = "Whatever, keywords are like so, whatever... Unrelated, I now know " +
           "what it's like to be a tweenage girl. Go Edward.";
var matches = []; // An empty array to store results in.
/* When you search the text you need to convert it to lower case to make it
   searchable.
 * We'll be using the built in method 'String.indexOf(needle)' to match 
   the strings as it avoids the need to escape the input for regular expression
   metacharacters. */

//Text converted to lower case to allow case insensitive searchable.
var lowerCaseText = text.toLowerCase();
for (var i=0;i<words.length;i++) { //Loop through the `words` array
    //indexOf returns -1 if no match is found
    if (lowerCaseText.indexOf(words[i]) != -1) 
        matches.push(words[i]);    //Add to the `matches` array
}
查看更多
倾城 Initia
3楼-- · 2020-02-14 06:14

Remove the g modifier from your regex. Then it will find only one match.

查看更多
疯言疯语
4楼-- · 2020-02-14 06:23

Remove g flag from your regex:

/\b(keyword|whatever)\b/i
查看更多
家丑人穷心不美
5楼-- · 2020-02-14 06:27

What you're talking about can't be done with a JavaScript regex. It might be possible with advanced regex features like .NET's unrestricted lookbehind, but JavaScript's feature set is extremely limited. And even in .NET, it would probably be simplest to create a separate regex for each word and apply them one by one; in JavaScript it's your only option.

Greediness only applies to regexes that employ quantifiers, like /START.*END/. The . means "any character" and the * means "zero or more". After the START is located, the .* greedily consumes the rest of the text. Then it starts backtracking, "giving back" one character at a time until the next part of the regex, END succeeds in matching.
We call this regex "greedy" because it matches everything from the first occurrence of START to the last occurrence of END.

If there may be more than one "START"-to-"END" sequence, and you want to match just the first one, you can append a ? to the * to make it non-greedy: /START.*?END/. Now, each time the . tries to consume the next character, it first checks to see if it could match END at that spot instead. Thus it matches from the first START to the first END after that. And if you want to match all the "START"-to-"END" sequences individually, you add the 'g' modifier: /START.*?END/g.

It's a bit more complicated than that, of course. For example, what if these sequences can be nested, as in START…START…END…END? If I've gotten a little carried away with this answer, it's because understanding greediness is the first important step to mastering regexes. :-/

查看更多
登录 后发表回答