How to parse column separated key-value text with

2019-02-15 13:03发布

I need to parse the following text:

First: 1
Second: 2
Multiline: blablablabla
bla2bla2bla2
bla3b and key: value in the middle if strting
Fourth: value

Value is a string OR multiline string, at the same time value could contain "key: blablabla" substring. Such subsctring should be ignored (not parsed as a separate key-value pair).

Please help me with regex or other algorithm.

Ideal result would be:

$regex = "/SOME REGEX/";
$matches = [];
preg_match_all($regex, $html, $matches);
// $mathes has all key and value parsed pairs, including multilines values

Thank you.

I tried with simple regexes but result is incorrect, because I don't know how to handle multilines:

$regex = "/(.+?): (.+?)/";
$regex = "/(.+?):(.+?)\n/";
...

2条回答
Explosion°爆炸
2楼-- · 2019-02-15 13:45

You can do it with this pattern:

$pattern = '~(?<key>[^:\s]+): (?<value>(?>[^\n]*\R)*?[^\n]*)(?=\R\S+:|$)~';

preg_match_all($pattern, $txt, $matches, PREG_SET_ORDER);

print_r($matches);
查看更多
戒情不戒烟
3楼-- · 2019-02-15 14:02

You can sort of do it, as long as you consider a single word followed by a colon at the start of a line to be a new key start:

$data = 'First: 1
Second: 2
Multiline: blablablabla
bla2bla2bla2
bla3b and key: value in the middle if strting
Fourth: value';

preg_match_all('/^([a-z]+): (.*?)(?=(^[a-z]+:|\z))/ims', $data, $matches);

var_dump($matches);

This gives the following result:

array(4) {
  [0]=>
  array(4) {
    [0]=>
    string(10) "First: 1
"
    [1]=>
    string(11) "Second: 2
"
    [2]=>
    string(86) "Multiline: blablablabla
bla2bla2bla2
bla3b and key: value in the middle if strting
"
    [3]=>
    string(13) "Fourth: value"
  }
  [1]=>
  array(4) {
    [0]=>
    string(5) "First"
    [1]=>
    string(6) "Second"
    [2]=>
    string(9) "Multiline"
    [3]=>
    string(6) "Fourth"
  }
  [2]=>
  array(4) {
    [0]=>
    string(3) "1
"
    [1]=>
    string(3) "2
"
    [2]=>
    string(75) "blablablabla
bla2bla2bla2
bla3b and key: value in the middle if strting
"
    [3]=>
    string(5) "value"
  }
  [3]=>
  array(4) {
    [0]=>
    string(7) "Second:"
    [1]=>
    string(10) "Multiline:"
    [2]=>
    string(7) "Fourth:"
    [3]=>
    string(0) ""
  }
}
查看更多
登录 后发表回答