the code below is tested and working, it prints the contents of a feed that has this structure.
<rss>
<channel>
<item>
<pubDate/>
<title/>
<description/>
<link/>
<author/>
</item>
</channel>
</rss>
What I didn't manage to succesfully do is to print feeds that follow this structure below (the difference is on <feed><entry><published>
) even though I changed the xpath to /feed//entry
.
you can see the structure on the page source.
<feed>
<entry>
<published/>
<title/>
<description/>
<link/>
<author/>
</entry>
</feed>
I have to say that the code sorts all item
based on its pubDate
. In the second structure feed I guess it should sort all entry
based on its published
.
I probably make a mistake on the xPath I can't find. However, if at the end of this I manage to print that feed right, how can I modify the code to handle different structures all at once ?
Is there any service that allow me to create and host my own feeds based on those feeds, so I will have the same structure to all? I hope I made my self clear... Thank you.
<?php
$feeds = array();
// Get all feed entries
$entries = array();
foreach ($feeds as $feed) {
$xml = simplexml_load_file($feed);
$entries = array_merge($entries, $xml->xpath(''));
}
?>
This question is really two questions, "How to handle multiple xpath at once" and "[How to] create my own feeds with the same structure".
The second one has been brilliantly answered by Dimitre Novatchev. If you want to "merge" or transform one or several XML documents, that's definitely what I'd recommend.
Meanwhile, I'll take the easy path and address the first question, "How to handle multiple xpath at once". It's easy, there's an operator for that:
|
. If you want to query all nodes that match/feed//entry
or/rss//item
then you can use/feed//entry | /rss//item
.The main contribution of this answer is a solution (at the end) that can be used with infinite number of formats, just specifying all "entry" alternative names in the external (global) parameter
$postElements
and all "published-date" alternative names in the external (global) parameter$pub-dateElements
.Besides this, here is how to specify an XPath expression that selects all
/rss//item
and all/feed//entry
elements.In the simple case of just two possible document formats this (as proposed by @Josh Davis) Xpath expression correctly works:
A more general XPath expression allows the selection of the wanted elements from a set of unlimited number of document formats:
where the variable
$topElements
should be substituted by a pipe-delimited string of all possible names for a top element, and$postElements
should be substituted by a pipe-delimited string of all possible names for a "entry" element. We also allow the "entry" elements to be at different depths in the different document formats.In particular, for this concrete case the XPath expression will be;
The rest of this post shows how the complete wanted processing can be done entirely in XSLT -- easily and with elegance.
I. A gentle introduction
Such processing is easy and simple with XSLT:
when this transformation is applied to this XML document (in format 1):
and when it is applied on this equivalent document (in format 2):
in both cases the same wanted, correct result is produced:
II. The full solution
This can be generalized to a parameterized solution:
This transformation can be used with infinite number of formats, just specifying all "entry" alternative names in the external (global) parameter
$postElements
and all "published-date" alternative names in the external (global) parameter$pub-dateElements
.Anyone can try this transformation to verify that when applied on the two XML documents above it again produces the same, wanted and correct result.
Here's a solutions.
The problem is that many RSS or Atom feeds have namespaces defined which don't play nicely with SimpleXML. In the example below, I'm using str_replace to replace
xmlns=
tons=
. I'm then using the name of the root element to determine the type of feed (whether it's RSS or Atom).The
array_push
call takes care of adding all of the entries to the$entries
array which you can then use later.Another solution could be to use Google Reader to aggregate all of your feeds and use that feed instead of all of your separate ones.