PHP preg_split: Split string by other strings

2020-07-18 08:15发布

I want to split a large string by a series of words.

E.g.

$splitby = array('these','are','the','words','to','split','by');
$text = 'This is the string which needs to be split by the above words.';

Then the results would be:

$text[0]='This is';
$text[1]='string which needs';
$text[2]='be';
$text[3]='above';
$text[4]='.';

How can I do this? Is preg_split the best way, or is there a more efficient method? I'd like it to be as fast as possible, as I'll be splitting hundreds of MB of files.

4条回答
【Aperson】
2楼-- · 2020-07-18 08:46

I don't think using pcre regex is necessary ... if it's really splitting words you need.

You could do something like this and benchmark see if it's faster / better ...

$splitby = array('these','are','the','words','to','split','by');
$text = 'This is the string which needs to be split by the above words.';

$split = explode(' ', $text);
$result = array();
$temp = array();

foreach ($split as $s) {

    if (in_array($s, $splitby)) {
        if (sizeof($temp) > 0) {
           $result[] = implode(' ', $temp);
           $temp = array();
        }            
    } else {
        $temp[] = $s;
    }
}

if (sizeof($temp) > 0) {
    $result[] = implode(' ', $temp);
}

var_dump($result);

/* output

array(4) {
  [0]=>
  string(7) "This is"
  [1]=>
  string(18) "string which needs"
  [2]=>
  string(2) "be"
  [3]=>
  string(5) "above words."
}

The only difference with your output is the last word because "words." != "word" and it's not a split word.

查看更多
来,给爷笑一个
3楼-- · 2020-07-18 08:57

preg_split can be used as:

$pieces = preg_split('/'.implode('\s*|\s*',$splitby).'/',$text,-1,PREG_SPLIT_NO_EMPTY);

See it

查看更多
我只想做你的唯一
4楼-- · 2020-07-18 08:59

Since the words in your $splitby array are not regular expression maybe you can use

str_split

查看更多
戒情不戒烟
5楼-- · 2020-07-18 09:01

This should be reasonably efficient. However you may want to test with some files and report back on the performance.

$splitby = array('these','are','the','words','to','split','by');
$text = 'This is the string which needs to be split by the above words.';
$pattern = '/\s?'.implode($splitby, '\s?|\s?').'\s?/';
$result = preg_split($pattern, $text, -1, PREG_SPLIT_NO_EMPTY);
查看更多
登录 后发表回答