Parsing PHP Doc Comments into a Data Structure

2019-03-09 18:27发布

问题:

I'm using the Reflection API in PHP to pull a DocComment (PHPDoc) string from a method

$r = new ReflectionMethod($object);
$comment = $r->getDocComment();

This will return a string that looks something like this (depending on how well the method was documented)

/**
* Does this great things
*
* @param string $thing
* @return Some_Great_Thing
*/

Are there any built-in methods or functions that can parse a PHP Doc Comment String into a data structure?

$object = some_magic_function_or_method($comment_string);

echo 'Returns a: ', $object->return;

Lacking that, what part of the PHPDoc source code should I be looking at the do this myself.

Lacking and/or in addition to that, is there third party code that's considered "better" at this that the PHPDoc code?

I realize parsing these strings isn't rocket science, or even computer science, but I'd prefer a well tested library/routine/method that's been built to deal with a lot of the janky, semi-non-correct PHP Doc code that might exist in the wild.

回答1:

I am surprised this wasn't mentioned yet: what about using Zend_Reflection of Zend Framework? This may come in handy especially if you work with a software built on Zend Framework like Magento.

See the Zend Framework Manual for some code examples and the API Documentation for the available methods.

There are different ways to do this:

  • Pass a file name to Zend_Reflection_File.
  • Pass an object to Zend_Reflection_Class.
  • Pass an object and a method name to Zend_Reflection_Method.
  • If you really only have the comment string at hand, you even could throw together the code for a small dummy class, save it to a temporary file and pass that file to Zend_Reflection_File.

Let's go for the simple case and assume you have an existing class you want to inspect.

The code would be like this (untested, please forgive me):

$method = new Zend_Reflection_Method($class, 'yourMethod');
$docblock = $method->getDocBlock();

if ($docBlock->hasTag('return')) {
    $tagReturn = $docBlock->getTag('return'); // $tagReturn is an instance of Zend_Reflection_Docblock_Tag_Return
    echo "Returns a: " . $tagReturn->getType() . "<br>";
    echo "Comment for return type: " . $tagReturn->getDescription();
}


回答2:

You can use the "DocBlockParser" class from the Fabien Potencier Sami ("Yet Another PHP API Documentation Generator") open-source project.
First of all, get Sami from GitHub.
This is an example of how to use it:

<?php

require_once 'Sami/Parser/DocBlockParser.php';
require_once 'Sami/Parser/Node/DocBlockNode.php';

class TestClass {
    /**
     * This is the short description.
     *  
     * This is the 1st line of the long description 
     * This is the 2nd line of the long description 
     * This is the 3rd line of the long description   
     *  
     * @param bool|string $foo sometimes a boolean, sometimes a string (or, could have just used "mixed")
     * @param bool|int $bar sometimes a boolean, sometimes an int (again, could have just used "mixed") 
     * @return string de-html_entitied string (no entities at all)
     */
    public function another_test($foo, $bar) {
        return strtr($foo,array_flip(get_html_translation_table(HTML_ENTITIES)));
    }
}

use Sami\Parser\DocBlockParser;
use Sami\Parser\Node\DocBlockNode;

try {
    $method = new ReflectionMethod('TestClass', 'another_test');
    $comment = $method->getDocComment();
    if ($comment !== FALSE) {
        $dbp = new DocBlockParser();
        $doc = $dbp->parse($comment);
        echo "\n** getDesc:\n";
        print_r($doc->getDesc());
        echo "\n** getTags:\n";
        print_r($doc->getTags());
        echo "\n** getTag('param'):\n";
        print_r($doc->getTag('param'));
        echo "\n** getErrors:\n";
        print_r($doc->getErrors());
        echo "\n** getOtherTags:\n";
        print_r($doc->getOtherTags());
        echo "\n** getShortDesc:\n";
        print_r($doc->getShortDesc());
        echo "\n** getLongDesc:\n";
        print_r($doc->getLongDesc());
    }
} catch (Exception $e) {
    print_r($e);
}

?>

And here is the output of the test page:

** getDesc:
This is the short description.

This is the 1st line of the long description 
This is the 2nd line of the long description 
This is the 3rd line of the long description
** getTags:
Array
(
    [param] => Array
        (
            [0] => Array
                (
                    [0] => Array
                        (
                            [0] => Array
                                (
                                    [0] => bool
                                    [1] => 
                                )

                            [1] => Array
                                (
                                    [0] => string
                                    [1] => 
                                )

                        )

                    [1] => foo
                    [2] => sometimes a boolean, sometimes a string (or, could have just used "mixed")
                )

            [1] => Array
                (
                    [0] => Array
                        (
                            [0] => Array
                                (
                                    [0] => bool
                                    [1] => 
                                )

                            [1] => Array
                                (
                                    [0] => int
                                    [1] => 
                                )

                        )

                    [1] => bar
                    [2] => sometimes a boolean, sometimes an int (again, could have just used "mixed")
                )

        )

    [return] => Array
        (
            [0] => Array
                (
                    [0] => Array
                        (
                            [0] => Array
                                (
                                    [0] => string
                                    [1] => 
                                )

                        )

                    [1] => de-html_entitied string (no entities at all)
                )

        )

)

** getTag('param'):
Array
(
    [0] => Array
        (
            [0] => Array
                (
                    [0] => Array
                        (
                            [0] => bool
                            [1] => 
                        )

                    [1] => Array
                        (
                            [0] => string
                            [1] => 
                        )

                )

            [1] => foo
            [2] => sometimes a boolean, sometimes a string (or, could have just used "mixed")
        )

    [1] => Array
        (
            [0] => Array
                (
                    [0] => Array
                        (
                            [0] => bool
                            [1] => 
                        )

                    [1] => Array
                        (
                            [0] => int
                            [1] => 
                        )

                )

            [1] => bar
            [2] => sometimes a boolean, sometimes an int (again, could have just used "mixed")
        )

)

** getErrors:
Array
(
)

** getOtherTags:
Array
(
)

** getShortDesc:
This is the short description.
** getLongDesc:
This is the 1st line of the long description 
This is the 2nd line of the long description 
This is the 3rd line of the long description


回答3:

You could use DocBlox (http://github.com/mvriel/docblox) to generate a XML data structure for you; you can install DocBlox using PEAR and then run the command:

docblox parse -d [FOLDER] -t [TARGET_LOCATION]

This will generate a file called structure.xml which contains all meta data about your source code, including parsed docblocks.

OR

You can use the DocBlox_Reflection_DocBlock* classes to directly parse a piece of DocBlock text.

This you can do by making sure you have autoloading enabled (or include all DocBlox_Reflection_DocBlock* files) and execute the following:

$parsed = new DocBlox_Reflection_DocBlock($docblock);

Afterwards you can use the getters to extract the information that you want.

Note: you do not need to remove the asterisks; the Reflection class takes care of this.



回答4:

Check out

http://pecl.php.net/package/docblock

The docblock_tokenize() function will get you part-way there, I think.



回答5:

I suggest addendum, its pretty cool and well alive and used in many php5 frameworks...

http://code.google.com/p/addendum/

Check the tests for examples

http://code.google.com/p/addendum/source/browse/trunk#trunk%2Fannotations%2Ftests



回答6:

You can always view the source from phpDoc. The code is under LGPL so if you do decide to copy it you would need to license your software under the same license AND properly add the correct notices.

EDIT: Unless, as @Samuel Herzog, noted you use it as a library.

Thanks @Samuel Herzog for the clarification.



回答7:

From your description I can only suspect what you are trying to do (PHP code documentation). Since you don't state why you are trying to do this I can only speculate.

Maybe you should try another approach. To document PHP code (if that is what you are trying) I would use doxygen and from the look of your code comment, it is already formatted for doxygen.

With Graphviz, doxygen also renders nice Class diagrams and call trees.



回答8:

If you're trying to read in the @ tags and their values, then using preg_match would be the best solution.



回答9:

I suggest you to take a look at http://code.google.com/p/php-annotations/

The code is fairly simple to be modified/understood if needed.



回答10:

As pointed out in one of the answers above, you can use phpDocumentor. If you use composer, then just add "phpdocumentor/reflection-docblock": "~2.0" to your "require" block.

See this for an example: https://github.com/abdulla16/decoupled-app/blob/master/composer.json

For usage examples, see: https://github.com/abdulla16/decoupled-app/blob/master/Container/Container.php



回答11:

Updated version of user1419445's code. The DocBlockParser::parse() method is changed and needs a second context parameter. It also seems to be slightly coupled with phpDocumentor, so for the sake of simplicity I would assume you have Sami installed via Composer. The code below works for Sami v4.0.16

<?php

require_once 'vendor/autoload.php';

class TestClass {
    /**
     * This is the short description.
     *  
     * This is the 1st line of the long description 
     * This is the 2nd line of the long description 
     * This is the 3rd line of the long description   
     *  
     * @param bool|string $foo sometimes a boolean, sometimes a string (or, could have just used "mixed")
     * @param bool|int $bar sometimes a boolean, sometimes an int (again, could have just used "mixed") 
     * @return string de-html_entitied string (no entities at all)
     */
    public function another_test($foo, $bar) {
        return strtr($foo,array_flip(get_html_translation_table(HTML_ENTITIES)));
    }
}

use Sami\Parser\DocBlockParser;
use Sami\Parser\Filter\PublicFilter;
use Sami\Parser\ParserContext;

try {
    $method = new ReflectionMethod('TestClass', 'another_test');
    $comment = $method->getDocComment();
    if ($comment !== FALSE) {
        $dbp = new DocBlockParser();
        $filter = new PublicFilter;
        $context = new ParserContext($filter, $dbp, NULL);
        $doc = $dbp->parse($comment, $context);
        echo "\n** getDesc:\n";
        print_r($doc->getDesc());
        echo "\n** getTags:\n";
        print_r($doc->getTags());
        echo "\n** getTag('param'):\n";
        print_r($doc->getTag('param'));
        echo "\n** getErrors:\n";
        print_r($doc->getErrors());
        echo "\n** getOtherTags:\n";
        print_r($doc->getOtherTags());
        echo "\n** getShortDesc:\n";
        print_r($doc->getShortDesc());
        echo "\n** getLongDesc:\n";
        print_r($doc->getLongDesc());
    }
} catch (Exception $e) {
    print_r($e);
}

?>