可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
This is html text in the website, i want to grab
1,000 Places To See Before You Die
<ul class="listings">
<li>
<a href="http://watchseries.eu/serie/1,000_places_to_see_before_you_die" title="1,000 Places To See Before You Die">
1,000 Places To See Before You Die
<span class="epnum">2009</span>
</a>
</li>
I used the code like this
foreach($html->find('ul.listings li a') as $e)
echo $e->innertext. '<br/>';
The output i am getting is like
999: Whats Your Emergency<span class="epnum">2012</span>
including the span pls help me this
回答1:
You can use strip_tags()
for that
echo trim(strip_tags($e->innertext));
Or try to use preg_replace()
to remove unwanted tag and its content
echo preg_replace('/<span[^>]*>([\s\S]*?)<\/span[^>]*>/', '', $e->innertext);
回答2:
Why not DOMDocument
and get title attribute?:
$string = '<ul class="listings">
<li>
<a href="http://watchseries.eu/serie/1,000_places_to_see_before_you_die" title="1,000 Places To See Before You Die">
1,000 Places To See Before You Die
<span class="epnum">2009</span>
</a>
</li>';
$dom = new DOMDocument;
$dom->loadHTML($string);
$xpath = new DOMXPath($dom);
$text = $xpath->query('//ul[@class="listings"]/li/a/@title')->item(0)->nodeValue;
echo $text;
or
$text = explode("\n", trim($xpath->query('//ul[@class="listings"]/li/a')->item(0)->nodeValue));
echo $text[0];
Codepad Example
回答3:
There are 2 ways that I could think of to solve this. One, is that you grab the title attribute from the anchor tag. Of course, not everyone set up a title attribute for the anchor tag and the value of the attribute could be different if they want to fill it that way. The other solution is that, you get the innertext
attribute and then replace every child of the anchor tag with an empty value.
So, either do this
$e->title;
or this
$text = $e->innertext;
foreach ($e->children() as $child)
{
$text = str_replace($child, '', $text);
}
Though, it might be a good idea to use DOMDocument
instead for this.
回答4:
Use plaintext
instead.
echo $e->plaintext;
But still the year will be present which you can trim off using regexp.
Example from the documentation here:
$html = str_get_html("<div>foo <b>bar</b></div>");
$e = $html->find("div", 0);
echo $e->tag; // Returns: " div"
echo $e->outertext; // Returns: " <div>foo <b>bar</b></div>"
echo $e->innertext; // Returns: " foo <b>bar</b>"
echo $e->plaintext; // Returns: " foo bar"
回答5:
First of all check your html. Now it is like
$string = '<ul class="listings">
<li>
<a href="http://watchseries.eu/serie/1,000_places_to_see_before_you_die" title="1,000 Places To See Before You Die">
1,000 Places To See Before You Die
<span class="epnum">2009</span>
</a>
</li>';
There is no close tag for ul, perhaps you missed it.
$string = '<ul class="listings">
<li>
<a href="http://watchseries.eu/serie/1,000_places_to_see_before_you_die" title="1,000 Places To See Before You Die">
1,000 Places To See Before You Die
<span class="epnum">2009</span>
</a>
</li>
</ul>';
Try like this
$xml = simplexml_load_string($string);
echo $xml->li->a['title'];