Parse text between 2 words

2020-04-11 15:46发布

For sure this has already been asked by someone else, however I've searched here on SO and found nothing https://stackoverflow.com/search?q=php+parse+between+words

I have a string and want to get an array with all the words contained between 2 delimiters (2 words). I am not confident with regex so I ended up with this solution, but it is not appropiate because I need to get all the words that match those requirements and not only the first one.

$start_limiter = 'First';
$end_limiter = 'Second';
$haystack = $string;

# Step 1. Find the start limiter's position

$start_pos = strpos($haystack,$start_limiter);
if ($start_pos === FALSE)
{
    die("Starting limiter ".$start_limiter." not found in ".$haystack);
}

# Step 2. Find the ending limiters position, relative to the start position

$end_pos = strpos($haystack,$end_limiter,$start_pos);

if ($end_pos === FALSE)
{
    die("Ending limiter ".$end_limiter." not found in ".$haystack);
}

# Step 3. Extract the string between the starting position and ending position
# Our starting is the position of the start limiter. To find the string we must take
# the ending position of our end limiter and subtract that from the start limiter
$needle = substr($haystack, $start_pos+1, ($end_pos-1)-$start_pos);

echo "Found $needle";

I thought also about using explode() but I think a regex could be better and faster.

5条回答
来,给爷笑一个
2楼-- · 2020-04-11 16:10

I'm not much familiar with PHP, but it seems to me that you can use something like:

if (preg_match("/(?<=First).*?(?=Second)/s", $haystack, $result))
    print_r($result[0]);

(?<=First) looks behind for First but doesn't consume it,

.*? Captures everything in between First and Second,

(?=Second) looks ahead for Second but doesn't consume it,

The s at the end is to make the dot . match newlines if any.


To get all the text between those delimiters, you use preg_match_all and you can use a loop to get each element:

if (preg_match_all("/(?<=First)(.*?)(?=Second)/s", $haystack, $result))
    for ($i = 1; count($result) > $i; $i++) {
        print_r($result[$i]);
    }
查看更多
Animai°情兽
3楼-- · 2020-04-11 16:12

You can also use two explode statements.

For example, say you want to get "z" in y=mx^z+b. To get z:

$formula="y=mx^z+b";
$z=explode("+",explode("^",$formula)[1])[0];

First I get everything after ^: explode("^",$formula)[1]

Then I get everything before +: explode("+",$previousExplode)[0]

查看更多
Juvenile、少年°
4楼-- · 2020-04-11 16:23

Not sure that the result will be faster than your code, but you can do it like this with regex:

$pattern = '~(?<=' . preg_quote($start, '~') 
         . ').+?(?=' . preg_quote($end, '~') . ')~si';
if (preg_match($pattern, $subject, $match))
    print_r($match[0]);

I use preg_quote to escape all characters that have a special meaning in a regex (like +*|()[]{}.? and the pattern delimiter ~)

(?<=..) is a lookbehind assertion that check a substring before what you want to find.
(?=..) is a lookahead assertion (same thing for after)
.+? means all characters one or more times but the less possible (the question mark make the quantifier lazy)

s allows the dot to match newlines (not the default behavior)
i make the search case insensitive (you can remove it, if you don't need)

查看更多
别忘想泡老子
5楼-- · 2020-04-11 16:25

This allows you to run the same function with different parameters, just so you don't have to rewrite this bit of code all of the time. Also uses the strpos which you used. Has been working great for me.

function get_string_between($string, $start, $end){
    $string = " ".$string;
    $ini = strpos($string,$start);
    if ($ini == 0) return "";
    $ini += strlen($start);
    $len = strpos($string,$end,$ini) - $ini;
    return substr($string,$ini,$len);
}

$fullstring = 'This is a long set of words that I am going to use.';

$parsed = get_string_between($fullstring, 'This', "use");

echo $parsed;

Will output:

is a long set of words that I am going to
查看更多
贼婆χ
6楼-- · 2020-04-11 16:30

Here's a simple example for finding everything between the words 'mega' and 'yo' for the string $t.

PHP Example

$t = "I am super mega awesome-sauce, yo!";

$arr = [];
preg_match("/mega\ (.*?)\ yo/ims", $t, $arr);

echo $arr[1];

PHP Output

awesome-sauce,
查看更多
登录 后发表回答