I have the following string:
<w:pPr>
<w:spacing w:line="240" w:lineRule="exact"/>
<w:ind w:left="1890" w:firstLine="360"/>
<w:rPr>
<w:b/>
<w:color w:val="00000A"/>
<w:sz w:val="24"/>
</w:rPr>
</w:pPr>
and I am trying to parse the "w:sz w:val" value using preg_match().
So far, I've tried:
preg_match('/<w:sz w:val="(\d)"/', $p, $fonts);
but this has not worked, and I'm unsure why?
Any Ideas?
Thank you in advance!
You were trying to capture only single-digit numbers. Try adding a + to make "one or more".
I prefer [0-9]+ for easier reading, and because it avoids the potentially funny need to double-up on \ symbols.
While you have a working code at hand, there are two other possibilities, namely with
DomDocument
andSimpleXML
. This is somewhat tricky with the colons (aka namespaces) but consider the following examples. I have added a container tag to define the namespace but you will definitely have one in your xml as well. Solution 1 (theDOM
way) searches the DOM with a namespace prefix and reads the attributes. Solution 2 (withSimpleXML
) does the same (perhaps in a more intuitive and comprehensible way).The XML: (using PHP HEREDOC Syntax)
Solution 1: Using DomDocument
Solution 2: Using SimpleXML with Namespaces
You just need a little correction to your regex:
So it goes:
Why? Because with just \d you are checking for 1 digit, but with \d+ you are checking for 1 or more.
EDIT:
In case you need it, there are some great regex online testing tools, like https://regex101.com/. Try your expressions there before using them, just in case. You never know ;)