I have a file that is structured in a large multidimensional structure, similar to json, but not close enough for me to use a json library.
The data looks something like this:
alpha {
beta {
charlie;
}
delta;
}
echo;
foxtrot {
golf;
hotel;
}
The regex I am trying to build (for a preg_match_all) should match each top level parent (delimited by {} braces) so that I can recurse through the matches, building up a multidimensional php array that represents the data.
The first regex I tried is /(?<=\{).*(?=\})/s
which greedily matches content inside braces, however this isn't quite right as when there is more than one sibling in the top level the match is too greedy. Example below:
Using regex /(?<=\{).*(?=\})/s
match is given as:
Match 1:
beta {
charlie;
}
delta;
}
echo;
foxtrot {
golf;
hotel;
Instead the result should be:
Match 1:
beta {
charlie;
}
delta;
Match 2:
golf;
hotel;
So regex wizards, what function am I missing here or do I need to solve this with php somehow? Any tips very welcome :)
You can't 1 do this with regular expressions.
Alternatively, if you want to match deep-to-shallow blocks, you can use \{[^\{\}]*?\}
and preg_replace_callback()
to store the value, and return null
to erase it from the string. The callback will need to take care of nesting the value accordingly.
$heirarchalStorage = ...;
do {
$string = \preg_replace_callback('#\{[^\{\}]*?\}#', function($block)
use(&$heirarchalStorage) {
// do your magic with $heirarchalStorage
// in here
return null;
}, $string);
} while (!empty($string));
Incomplete, not tested, and no warranty.
This approach requires that the string be wrapped in {}
as well, otherwise the final match won't happen and you'll loop forever.
This is an awful lot of (inefficient) work for something that can just as easily be solved with a well known exchange/storage format such as JSON.
1 I was going to put "you can, but...", however I'll just say once again, "You can't" 2
2 Don't
Sure you can do this with regular expressions.
preg_match_all(
'/([^\s]+)\s*{((?:[^{}]*|(?R))*)}/',
$yourStuff,
$matches,
PREG_SET_ORDER
);
This gives me the following in matches:
[1]=>
string(5) "alpha"
[2]=>
string(46) "
beta {
charlie;
}
delta;
"
and
[1]=>
string(7) "foxtrot"
[2]=>
string(22) "
golf;
hotel;
"
Breaking it down a little bit.
([^\s]+) # non-whitespace (block name)
\s* # whitespace (between name and block)
{ # literal brace
( # begin capture
(?: # don't create another capture set
[^{}]* # everything not a brace
|(?R) # OR recurse
)* # none or more times
) # end capture
} # literal brace
Just for your information, this works fine on n-deep levels of braces.
I think you might get something using preg_split
by matching [a-zA-Z0-9][:blank]+{
and }
. You'll be able to construct your array by going through the result. Use a recursive function which goes deeper when you match an opening tag, and upper on a closing tag.
Otherwise, cleanest solution would be to implement an ANTLR grammar !