As the title says, here is an example input:
(outer
(center
(inner)
(inner)
center)
ouer)
(outer
(inner)
ouer)
(outer
ouer)
Of course, the matched strings will be processed by recursion.
I want the first recursion to match:
[
(outer
(center
(inner)
(inner)
center)
ouer),
(outer
(inner)
ouer),
(outer
ouer)]
And the after processes is needless to say...
Many regex implementations will not allow you to match an arbitrary amount of nesting. However, Perl, PHP and .NET support recursive patterns.
A demo in Perl:
which will print:
Or, the PHP equivalent:
which produces:
An explanation:
EDIT
Note @Goozak's comment:
Don't use regex.
Instead, a simple recursive function will suffice:
I found this simple regex which extracts all nested balanced groups using recursion, although the resulting solution is not quite straightforward as you may expect:
Regex pattern:
(1(?:\1??[^1]*?2))+
Sample input:
1ab1cd1ef2221ab1cd1ef222
For simplicity I put
1
for open and2
for closed bracket. Alpha characters represent some inner data. I'll rewrite input so that it can be easy to explain.In first iteration regex will match the most inner subgroup
1ef2
of in first sibling group1ab1cd1ef222
. If we remember it and it's position, and remove this group, there would remain1ab1cd22
. If we continue with regex, it would return1cd2
, and finally1ab2
. Then, it will continue to parse second sibling group the same way.As we see from this example, regex will properly extract substrings as they appear in the hierarchy defined by brackets. Position of particularly substring in hierarchy will be determined during second iteration, if it's position in string is between substring from second iteration, then it is a child node, else it's a sibling node.
From our example:
1ab1cd1ef222 1ab1cd1ef222
, iteration match1ef2
, with index6
,1ab1cd22 1ab1cd1ef222
, iteration match1cd2
, with index3
, ending with6
. Because3
<6
<=6
, first substring is child of the second substring.1ab2 1ab1cd1ef222
, iteration match1ab2
, with index0
, ending with3
. Because0
<3
<=3
, first substring is child of the second substring.1ab1cd1ef222
, iteration match1ef2
, with index6
, Because it's not3
<0
<=6
, it's branch from another sibling, etc...We must iterate and remove all siblings before we can move to the parent. Thus, we must remember all that siblings in the order they appear in iteration.
Delphi Pascal code based on posting above from nneonneo:
You need a form with a button on it, named btnRun. In the source code, replace "arnolduss" with your name in the DownLoads folder. Note the stack level in the output created by ParseList. Obviously brackets of the same type must open and close on the same stack level. You will now be able to extract the so-called groups per stack level.