PHP and RegEx: Split a string by commas that are n

2020-02-07 01:22发布

Two days ago I started working on a code parser and I'm stuck.

How can I split a string by commas that are not inside brackets, let me show you what I mean:

I have this string to parse:

one, two, three, (four, (five, six), (ten)), seven

I would like to get this result:

array(
 "one"; 
 "two"; 
 "three"; 
 "(four, (five, six), (ten))"; 
 "seven"
)

but instead I get:

array(
  "one"; 
  "two"; 
  "three"; 
  "(four"; 
  "(five"; 
  "six)"; 
  "(ten))";
  "seven"
)

How can I do this in PHP RegEx.

Thank you in advance !

7条回答
在下西门庆
2楼-- · 2020-02-07 01:54

You can do that easier:

preg_match_all('/[^(,\s]+|\([^)]+\)/', $str, $matches)

But it would be better if you use a real parser. Maybe something like this:

$str = 'one, two, three, (four, (five, six), (ten)), seven';
$buffer = '';
$stack = array();
$depth = 0;
$len = strlen($str);
for ($i=0; $i<$len; $i++) {
    $char = $str[$i];
    switch ($char) {
    case '(':
        $depth++;
        break;
    case ',':
        if (!$depth) {
            if ($buffer !== '') {
                $stack[] = $buffer;
                $buffer = '';
            }
            continue 2;
        }
        break;
    case ' ':
        if (!$depth) {
            continue 2;
        }
        break;
    case ')':
        if ($depth) {
            $depth--;
        } else {
            $stack[] = $buffer.$char;
            $buffer = '';
            continue 2;
        }
        break;
    }
    $buffer .= $char;
}
if ($buffer !== '') {
    $stack[] = $buffer;
}
var_dump($stack);
查看更多
成全新的幸福
3楼-- · 2020-02-07 01:56

You can't, directly. You'd need, at minimum, variable-width lookbehind, and last I knew PHP's PCRE only has fixed-width lookbehind.

My first recommendation would be to first extract parenthesized expressions from the string. I don't know anything about your actual problem, though, so I don't know if that will be feasible.

查看更多
叛逆
4楼-- · 2020-02-07 02:01

Hm... OK already marked as answered, but since you asked for an easy solution I will try nevertheless:

<?php
  $test = "one, two, three, , , ,(four, five, six), seven, (eight, nine)";
  $split = "/([(].*?[)])|(\w)+/";
  preg_match_all($split, $test, $out);
  print_r($out[0]);              
  die();
?>

Output

Array
(
    [0] => one
    [1] => two
    [2] => three
    [3] => (four, five, six)
    [4] => seven
    [5] => (eight, nine)
)
查看更多
迷人小祖宗
5楼-- · 2020-02-07 02:02

I am afraid that it could be very difficult to parse nested brackets like one, two, (three, (four, five)) only with RegExp.

查看更多
Explosion°爆炸
6楼-- · 2020-02-07 02:09

Clumsy, but it does the job...

<?php

function split_by_commas($string) {
  preg_match_all("/\(.+?\)/", $string, $result); 
  $problem_children = $result[0];
  $i = 0;
  $temp = array();
  foreach ($problem_children as $submatch) { 
    $marker = '__'.$i++.'__';
    $temp[$marker] = $submatch;
    $string   = str_replace($submatch, $marker, $string);  
  }
  $result = explode(",", $string);
  foreach ($result as $key => $item) {
    $item = trim($item);
    $result[$key] = isset($temp[$item])?$temp[$item]:$item;
  }
  return $result;
}


$test = "one, two, three, (four, five, six), seven, (eight, nine), ten";

print_r(split_by_commas($test));

?>
查看更多
地球回转人心会变
7楼-- · 2020-02-07 02:10

I feel that its worth noting, that you should always avoid regular expressions when you possibly can. To that end, you should know that for PHP 5.3+ you could use str_getcsv(). However, if you're working with files (or file streams), such as CSV files, then the function fgetcsv() might be what you need, and its been available since PHP4.

Lastly, I'm surprised nobody used preg_split(), or did it not work as needed?

查看更多
登录 后发表回答