I got a Google Shopping feed like this (extract):
<?xml version="1.0" encoding="utf-8" ?>
<rss version="2.0" xmlns:g="http://base.google.com/ns/1.0">
...
<g:id><![CDATA[Blah]]></g:id>
<title><![CDATA[Blah]]></title>
<description><![CDATA[Blah]]></description>
<g:product_type><![CDATA[Blah]]></g:product_type>
Now, SimpleXML can read the "title" and "description" tags but it can't read the tags with "g:" prefix.
There are solutions on stackoverflow for this specific case, using the "children" function. But I don't only want to read Google Shopping XMLs, I need it to be undependend from structure or namespace, I don't know anything about the file (I recursively loop through the nodes as an multidimensional array).
Is there a way to do it with SimpleXML? I could replace the colons, but I want to be able to store the array and reassemble the XML (in this case specifically for Google Shopping) so I do not want to lose information.
You want to use SimpleXMLElement to extract data from XML and convert it into an array.
This is generally possible but comes with some caveats. Before XML Namespaces your XML comes with CDATA. For XML to array conversion with Simplexml you need to convert CDATA to text when you load the XML string. This is done with the
LIBXML_NOCDATA
flag. Example:This gives you the following output:
As you can already see, there is no nice form to present the attributes in an array, therefore Simplexml by convention puts these into the
@attributes
key.The other problem you have is to handle those multiple XML namespaces. In the previous example no specific namespace was used. That is the default namespace. When you convert a SimpleXMLElement to an array, the namespace of the SimpleXMLElement is used. As none was explicitly specified, the default namespace has been taken.
But if you specify a namespace when you create the array, that namespace is taken.
Example:
This gives you the following output:
As you can see, this time the namespace that has been specified when the SimpleXMLElement was created is used in the array conversion:
http://base.google.com/ns/1.0
.As you write you want to take all namespaces from the document into account, you need to obtain those first - including the default one:
Then you can iterate over all namespaces and recursively merge them into the same array shown below:
This then finally should create and output the array of your choice:
As you can see, this is perfectly possible with SimpleXMLElement. However it's important you understand how SimpleXMLElement converts into an array (or serializes to JSON which does follow the same rules). To simulate the SimpleXMLElement-to-array conversion, you can make use of
print_r
for a quick output.Note that not all XML constructs can be equally well converted into an array. That's not specifically a limitation of Simplexml but lies in the nature of which structures XML can represent and which structures an array can represent.
Therefore it is most often better to keep the XML inside an object like SimpleXMLElement (or DOMDocument) to access and deal with the data - and not with an array.
However it's perfectly fine to convert data into an array as long as you know what you do and you don't need to write much code to access members deeper down the tree in the structure. Otherwise SimpleXMLElement is to be favored over an array because it allows dedicated access not only to many of the XML feature but also querying like a database with the
SimpleXMLElement::xpath
method. You would need to write many lines of own code to access data inside the XML tree that comfortable on an array.To get the best of both worlds, you can extend SimpleXMLElement for your specific conversion needs:
Which does output:
For the underlying implementation:
Which is an adoption with namespaces of the Changing JSON Encoding Rules example given in SimpleXML and JSON Encode in PHP – Part III and End.