Match string in between two strings

2019-01-28 12:52发布

If I have a string like this:

var str = "play the Ukulele in Lebanon. play the Guitar in Lebanon.";

I want to get the strings between each of the substrings "play" and "in", so basically an array with "the Ukelele" and "the Guitar".

Right now I'm doing:

var test = str.match("play(.*)in");

But that's returning the string between the first "play" and last "in", so I get "the Ukulele in Lebanon. Play the Guitar" instead of 2 separate strings. Does anyone know how to globally search a string for all occurrences of a substring between a starting and ending string?

4条回答
等我变得足够好
2楼-- · 2019-01-28 13:29

You are so close to the right answer. There are a few things you may be overlooking:

  1. You need your match to be non-greedy, this can be accomplished by using the ? operator
  2. Do not use the String.match() method as it's proven to match the entirety of the pattern and does not pay attention to capturing groups as you would expect. An alternative is to use RegExp.exec() or String.replace(), but using replace would require a little more work, so stick to building your own array with exec

var str     = "display the Ukulele in Lebanon. play the Guitar in Lebanon.";
var re      = /\bplay (.+?) in\b/g;
var matches = [];
var match;

while ( match = re.exec(str) ){
  matches[ matches.length ] = match[1];
}


document.getElementById('demo').innerHTML = JSON.stringify( matches );
<pre id="demo"></pre>

查看更多
Animai°情兽
3楼-- · 2019-01-28 13:31

A victim of greedy matching.

.* finds the longest possible match,

while .*? finds the shortest possible match.

For the example given str will be an array or 3 strings containing:

    the Ukelele
    the Guitar
    Lebanon
查看更多
做自己的国王
4楼-- · 2019-01-28 13:39

You can use the regex

play\s*(.*?)\s*in

  1. Use the / as delimiters for regex literal syntax
  2. Use the lazy group to match minimal possible

Demo:

var str = "play the Ukulele in Lebanon. play the Guitar in Lebanon.";
var regex = /play\s*(.*?)\s*in/g;

var matches = [];
while (m = regex.exec(str)) {
  matches.push(m[1]);
}

document.body.innerHTML = '<pre>' + JSON.stringify(matches, 0, 4) + '</pre>';

查看更多
5楼-- · 2019-01-28 13:40

/\bplay\s+(.+?)\s+in\b/ig might be more specific and might work better for you.

I believe there may be some issues with the regexes offered previously. For instance, /play\s*(.*?)\s*in/g will find a match within "displaying photographs in sequence". Of course this is not what you want. One of the problems is that there is nothing specifying that "play" should be a discrete word. It needs a word boundary before it and at least one instance of white space after it (it can't be optional). Similarly, the white space after the capture group should not be optional.

The other expression offered at the time I added this, /play (.+?) in/g, lacks the word boundary token before "play" and after "in", so it will contain a match in "display blue ink". This is not what you want.

As to your expression, it was missing the word boundary and white space tokens as well. But as another mentioned, it also needed the wildcard to be lazy. Otherwise, given your example string, your match would start with the first instance of "play" and end with the 2nd instance of "in".

If issues with my offered expression are found, would appreciate feedback.

查看更多
登录 后发表回答