I have the following string:
<w:pPr>
<w:spacing w:line="240" w:lineRule="exact"/>
<w:ind w:left="1890" w:firstLine="360"/>
<w:rPr>
<w:b/>
<w:color w:val="00000A"/>
<w:sz w:val="24"/>
</w:rPr>
</w:pPr>
and I am trying to parse the "w:sz w:val" value using preg_match().
So far, I've tried:
preg_match('/<w:sz w:val="(\d)"/', $p, $fonts);
but this has not worked, and I'm unsure why?
Any Ideas?
Thank you in advance!
You were trying to capture only single-digit numbers. Try adding a + to make "one or more".
preg_match('/<w:sz w:val="(\d+)"/', $p, $fonts);
I prefer [0-9]+ for easier reading, and because it avoids the potentially funny need to double-up on \ symbols.
preg_match('/<w:sz w:val="([0-9]+)"/', $p, $fonts);
While you have a working code at hand, there are two other possibilities, namely with DomDocument
and SimpleXML
. This is somewhat tricky with the colons (aka namespaces) but consider the following examples. I have added a container tag to define the namespace but you will definitely have one in your xml as well.
Solution 1 (the DOM
way) searches the DOM with a namespace prefix and reads the attributes. Solution 2 (with SimpleXML
) does the same (perhaps in a more intuitive and comprehensible way).
The XML: (using PHP HEREDOC Syntax)
$xml = <<<EOF
<?xml version="1.0"?>
<container xmlns:w="http://example">
<w:pPr>
<w:spacing w:line="240" w:lineRule="exact"/>
<w:ind w:left="1890" w:firstLine="360"/>
<w:rPr>
<w:b/>
<w:color w:val="00000A"/>
<w:sz w:val="24"/>
</w:rPr>
</w:pPr>
</container>
EOF;
Solution 1: Using DomDocument
$dom = new DOMDocument();
$dom->loadXML($xml);
$ns = 'http://example';
$data = $dom->getElementsByTagNameNS($ns, 'sz')->item(0);
$attr = $data->getAttribute('w:val');
echo $attr; // 24
Solution 2: Using SimpleXML with Namespaces
$simplexml = simplexml_load_string($xml);
$namespaces = $simplexml->getNamespaces(true);
$items = $simplexml->children($namespaces['w']);
$val = $items->pPr->rPr->sz["val"]->__toString();
echo "val: $val"; // val: 24
You just need a little correction to your regex:
<w:sz w:val="(\d)+"
So it goes:
preg_match('/<w:sz w:val="(\d+)"/', $p, $fonts);
Why? Because with just \d you are checking for 1 digit, but with \d+ you are checking for 1 or more.
EDIT:
In case you need it, there are some great regex online testing tools, like https://regex101.com/. Try your expressions there before using them, just in case. You never know ;)