I've got a requirement for parsing PHP files in C#. We essentially require some of the devs in another country to upload PHP files and once uploaded we need to check the php files and get a list of all the methods and classes/functions etc.
I thought of using a regex but I can't workout if a function belongs to a class etc, so I was wondering if theres already something 'out there' that will parse out PHP files and spit out its functions (I'm trying to avoid writing a full blow AST implementation).
Does anyone have any idea? I looked at Coco/R but I couldn't find a PHP grammar file. I'm using .NET 2.0 and C#.
You might be able to use
ctags
for your purpose. I'm not sure how you would integrate it with C# though, since ctags is written in C.Alternatively, if you know your parsers, you can take a look at the grammar files in the PHP source. In particular
zend_ini_parser.y
andzend_language_parser.y
.Finally, while not the best solution, you could probably get away with a home brewed handful of regular expressions. PHP's grammar is fairly strict with regards to classes and functions. You just need to keep track of a little bit of state, so you know which class a function belongs to.
Why do this in C#? In PHP this is trivial to do. Use the
token_get_all()
function and it will break a PHP file into a stream of lexemes that you can use to definitively determine the list of classes and methods by writing a finite state machine.Whatever you do don't try and do this with regular expressions. It will be incredibly tedious and error-prone.
Edit: There are three basic possibilities for doing this:
Anything else will involve either writing a PHP parser (a lot of work) or using really flaky regular expressions that will be an unreliable support nightmare.
To be concerned about supposed "security flaws" of PHP has several problems: