PHP: preg_match() not correct

2019-07-16 10:42发布

问题:

I have the following string:

<w:pPr>
    <w:spacing w:line="240" w:lineRule="exact"/>
    <w:ind w:left="1890" w:firstLine="360"/>
    <w:rPr>
        <w:b/>
        <w:color w:val="00000A"/>
        <w:sz w:val="24"/>
    </w:rPr>
</w:pPr>

and I am trying to parse the "w:sz w:val" value using preg_match().

So far, I've tried:

preg_match('/<w:sz w:val="(\d)"/', $p, $fonts);

but this has not worked, and I'm unsure why?

Any Ideas?

Thank you in advance!

回答1:

You were trying to capture only single-digit numbers. Try adding a + to make "one or more".

preg_match('/<w:sz w:val="(\d+)"/', $p, $fonts);

I prefer [0-9]+ for easier reading, and because it avoids the potentially funny need to double-up on \ symbols.

preg_match('/<w:sz w:val="([0-9]+)"/', $p, $fonts);


回答2:

While you have a working code at hand, there are two other possibilities, namely with DomDocument and SimpleXML. This is somewhat tricky with the colons (aka namespaces) but consider the following examples. I have added a container tag to define the namespace but you will definitely have one in your xml as well. Solution 1 (the DOM way) searches the DOM with a namespace prefix and reads the attributes. Solution 2 (with SimpleXML) does the same (perhaps in a more intuitive and comprehensible way).

The XML: (using PHP HEREDOC Syntax)

$xml = <<<EOF
<?xml version="1.0"?>
<container xmlns:w="http://example">
    <w:pPr>
        <w:spacing w:line="240" w:lineRule="exact"/>
        <w:ind w:left="1890" w:firstLine="360"/>
        <w:rPr>
            <w:b/>
            <w:color w:val="00000A"/>
            <w:sz w:val="24"/>
        </w:rPr>
    </w:pPr>
</container>
EOF;

Solution 1: Using DomDocument

$dom = new DOMDocument();
$dom->loadXML($xml);

$ns = 'http://example';

$data = $dom->getElementsByTagNameNS($ns, 'sz')->item(0);
$attr = $data->getAttribute('w:val');
echo $attr; // 24

Solution 2: Using SimpleXML with Namespaces

$simplexml = simplexml_load_string($xml);
$namespaces = $simplexml->getNamespaces(true);
$items = $simplexml->children($namespaces['w']);

$val = $items->pPr->rPr->sz["val"]->__toString();
echo "val: $val"; // val: 24


回答3:

You just need a little correction to your regex:

<w:sz w:val="(\d)+"

So it goes:

preg_match('/<w:sz w:val="(\d+)"/', $p, $fonts);

Why? Because with just \d you are checking for 1 digit, but with \d+ you are checking for 1 or more.

EDIT:

In case you need it, there are some great regex online testing tools, like https://regex101.com/. Try your expressions there before using them, just in case. You never know ;)