I am trying to display Xml content in to tables, all works perfectly but some content in the tag that i don't want to display, I want only image but not
November 2012 calendar from 5.10 The Test
like in xml,
<content:encoded><![CDATA[<p>November 2012 calendar from 5.10 The Test</p>
<p><a class="shutterset_" href='http://trance-gemini.com/wordpress/wp-content/gallery/calendars/laura-bertram-trance-gemini-145-1080.jpg' title='<br>November 2012 calendar from 5.10 The Test<br> <a href="</a></p>]]>
</content:encoded>
I want to display image but not
November 2012 calendar from 5.10 The Test
.
<?php
// load SimpleXML
$item = new SimpleXMLElement('test1.xml', null, true);
echo <<<EOF
<table border="1px">
<tr cl>
</tr>
EOF;
foreach($item->channel->item as $boo) // loop through our books
{
echo <<<EOF
<tr>
<td rowspan="3">{$boo->children('content', true)->encoded}</td>
<td>{$boo->title}</td>
</tr>
<tr>
<td>{$boo->description}</td>
</tr>
<tr>
<td>{boo->comments}</td>
</tr>
EOF;
}
echo '</table>';
?>
I once answered it but I don't find the answer any longer.
If you take a look at the string (simplified/beautified):
<content:encoded><![CDATA[
<p>Lorem Ipsom</p>
<p>
<a href='laura-bertram-trance-gemini-145-1080.jpg'
title='<br>November 2012 calendar from 5.10 The Test<br> <a href="</a>
</p>]]>
</content:encoded>
You can see that you have HTML encoded inside the node-value of the <content:encoded>
element. So first you need to obtain the HTML value, which you already do:
$html = $boo->children('content', true)->encoded;
Then you need to parse the HTML inside $html
. With which libraries HTML parsing can be done with PHP is outlined in:
- How to parse and process HTML/XML with PHP?
If you decide to use the more or less recommended DOMDocument
for the job, you only need to get the attribute value of a certain element:
- PHP DOMDocument getting Attribute of Tag
Or for its sister library SimpleXML you already use (so this is more recommended, see as well the next section):
- How to get an attribute with SimpleXML?
In context of your question here the following tip:
You're using SimpleXML. DOMDocument is a sister-library, meaning you can interchange between the two so you don't need to learn a full new library.
For example, you can use only the HTML parsing feature of DOMDocument
, but import it then into SimpleXML
. This is useful, because SimpleXML does not support HTML parsing.
That works via simplexml_import_dom()
.
A simplified step-by-step example:
// get the HTML string out of the feed:
$htmlString = $boo->children('content', true)->encoded;
// create DOMDocument for HTML parsing:
$htmlParser = new DOMDocument();
// load the HTML:
$htmlParser->loadHTML($htmlString);
// import it into simplexml:
$html = simplexml_import_dom($htmlParser);
Now you can use $html
as a new SimpleXMLElement that represents the HTML document. As your HTML chunks did not have any <body>
tags, according to the HTML specification, they are put inside the <body>
tag. This will allow you for example to access the href
attribute of the first <a>
inside the second <p>
element in your example:#
// access the element you're looking for:
$href = $html->body->p[1]->a['href'];
Here the full view from above (Online Demo):
// get the HTML string out of the feed:
$htmlString = $boo->children('content', true)->encoded;
// create DOMDocument for HTML parsing:
$htmlParser = new DOMDocument();
// your HTML gives parser warnings, keep them internal:
libxml_use_internal_errors(true);
// load the HTML:
$htmlParser->loadHTML($htmlString);
// import it into simplexml:
$html = simplexml_import_dom($htmlParser);
// access the element you're looking for:
$href = $html->body->p[1]->a['href'];
// output it
echo $href, "\n";
And what it outputs:
laura-bertram-trance-gemini-145-1080.jpg
you would need to parse the image url eg via preg_match
and this regex '(http://(?:[^']*))'