Fast parsing of PHP in C#

2019-06-17 10:29发布

I've got a requirement for parsing PHP files in C#. We essentially require some of the devs in another country to upload PHP files and once uploaded we need to check the php files and get a list of all the methods and classes/functions etc.

I thought of using a regex but I can't workout if a function belongs to a class etc, so I was wondering if theres already something 'out there' that will parse out PHP files and spit out its functions (I'm trying to avoid writing a full blow AST implementation).

Does anyone have any idea? I looked at Coco/R but I couldn't find a PHP grammar file. I'm using .NET 2.0 and C#.

标签: c# php parsing
2条回答
Juvenile、少年°
2楼-- · 2019-06-17 10:51

You might be able to use ctags for your purpose. I'm not sure how you would integrate it with C# though, since ctags is written in C.

Alternatively, if you know your parsers, you can take a look at the grammar files in the PHP source. In particular zend_ini_parser.y and zend_language_parser.y.

Finally, while not the best solution, you could probably get away with a home brewed handful of regular expressions. PHP's grammar is fairly strict with regards to classes and functions. You just need to keep track of a little bit of state, so you know which class a function belongs to.

查看更多
smile是对你的礼貌
3楼-- · 2019-06-17 11:11

Why do this in C#? In PHP this is trivial to do. Use the token_get_all() function and it will break a PHP file into a stream of lexemes that you can use to definitively determine the list of classes and methods by writing a finite state machine.

Whatever you do don't try and do this with regular expressions. It will be incredibly tedious and error-prone.

Edit: There are three basic possibilities for doing this:

  1. Do it in PHP. This will be the fastest (to develop) and simplest option;
  2. Run a command line PHP script to either do this or generate a series of tokens that can be interpreted by a C# program. This is the next easiest;
  3. Use Phalanger, a port of PHP to the .Net framework. This might be more palatable to management since it's still all .Net code; or
  4. Use Quercus, a port of PHP to the Java VM.

Anything else will involve either writing a PHP parser (a lot of work) or using really flaky regular expressions that will be an unreliable support nightmare.

To be concerned about supposed "security flaws" of PHP has several problems:

  1. Any framework or technology stack can have security flaws. The fact that your sysadmin only allows .Net effectively under protest over Java just indicates irrational bias. I say this as a long-time Java developer: Java, .Net and PHP can all have security flaws;
  2. You can run PHP from the command line so it doesn't serve any HTTP requests, which diminishes the issue of security flaws to basically zero;
  3. If you're worried about internal security threats (from someone with access to the box) simply restrict the PHP CLI executable to only be executable by a group that only your program is in.
查看更多
登录 后发表回答