Given a dummy function as such:
public function handle()
{
if (isset($input['data']) {
switch($data) {
...
}
} else {
switch($data) {
...
}
}
}
My intention is to get the contents of that function, the problem is matching nested patterns of curly braces {...}
.
I've come across recursive patterns but couldn't get my head around a regex that would match the function's body.
I've tried the following (no recursion):
$pattern = "/function\shandle\([a-zA-Z0-9_\$\s,]+\)?". // match "function handle(...)"
'[\n\s]?[\t\s]*'. // regardless of the indentation preceding the {
'{([^{}]*)}/'; // find everything within braces.
preg_match($pattern, $contents, $match);
That pattern doesn't match at all. I am sure it is the last bit that is wrong '{([^{}]*)}/'
since that pattern works when there are no other braces within the body.
By replacing it with:
'{([^}]*)}/';
It matched till the closing }
of the switch inside the if
statement and stopped there (including }
of the switch but excluding that of the if
).
As well as this pattern, same result:
'{(\K[^}]*(?=)})/m';
Update #2
According to others comments
^\s*[\w\s]+\(.*\)\s*\K({((?>"(?:[^"\\]*+|\\.)*"|'(?:[^'\\]*+|\\.)*'|//.*$|/\*[\s\S]*?\*/|#.*$|<<<\s*["']?(\w+)["']?[^;]+\3;$|[^{}<'"/#]++|[^{}]++|(?1))*)})
Note: A short RegEx i.e. {((?>[^{}]++|(?R))*)}
is enough if you know your input does not contain {
or }
out of PHP syntax.
So a long RegEx, in what evil cases does it work?
- You have
[{}]
in a string between quotation marks ["']
- You have those quotation marks escaped inside one another
- You have
[{}]
in a comment block. //...
or /*...*/
or #...
- You have
[{}]
in a heredoc or nowdoc <<<STR
or <<<['"]STR['"]
Otherwise it is meant to have a pair of opening/closing braces and depth of nested braces is not important.
Do we have a case that it fails?
No unless you have a martian that lives inside your codes.
^ \s* [\w\s]+ \( .* \) \s* \K # how it matches a function definition
( # (1 start)
{ # opening brace
( # (2 start)
(?> # atomic grouping (for its non-capturing purpose only)
"(?: [^"\\]*+ | \\ . )*" # double quoted strings
| '(?: [^'\\]*+ | \\ . )*' # single quoted strings
| // .* $ # a comment block starting with //
| /\* [\s\S]*? \*/ # a multi line comment block /*...*/
| \# .* $ # a single line comment block starting with #...
| <<< \s* ["']? # heredocs and nowdocs
( \w+ ) # (3) ^
["']? [^;]+ \3 ; $ # ^
| [^{}<'"/#]++ # force engine to backtack if it encounters special characters [<'"/#] (possessive)
| [^{}]++ # default matching bahaviour (possessive)
| (?1) # recurse 1st capturing group
)* # zero to many times of atomic group
) # (2 end)
} # closing brace
) # (1 end)
Formatting is done by @sln's RegexFormatter software.
What I provided in live demo?
Laravel's Eloquent Model.php file (~3500 lines) randomly is given as input. Check it out:
Live demo
This works to output header file (.h) out of inline function blocks (.c)
Find Regular expression:
(void\s[^{};]*)\n^\{($[^}$]*)\}$
Replace with:
$1;
For input:
void bar(int var)
{
foo(var);
foo2();
}
will output:
void bar(int var);
Get the body of the function block with second matched pattern :
$2
will output:
foo(var);
foo2();