Get the inner text using curl concept in php

This is html text in the website, i want to grab

1,000 Places To See Before You Die

<ul class="listings">
<li>
<a href="http://watchseries.eu/serie/1,000_places_to_see_before_you_die" title="1,000 Places To See Before You Die">
1,000 Places To See Before You Die
<span class="epnum">2009</span>
</a>
</li>

I used the code like this

foreach($html->find('ul.listings li a') as $e)
echo $e->innertext. '<br/>';

The output i am getting is like

 999: Whats Your Emergency<span class="epnum">2012</span>

including the span pls help me this

标签： php simple-html-dom

5条回答

傲

2楼-- · 2020-05-06 10:46

You can use strip_tags() for that

echo trim(strip_tags($e->innertext));

Or try to use preg_replace() to remove unwanted tag and its content

echo preg_replace('/<span[^>]*>([\s\S]*?)<\/span[^>]*>/', '', $e->innertext);

0人赞添加讨论(0) 举报

傲

3楼-- · 2020-05-06 10:52

There are 2 ways that I could think of to solve this. One, is that you grab the title attribute from the anchor tag. Of course, not everyone set up a title attribute for the anchor tag and the value of the attribute could be different if they want to fill it that way. The other solution is that, you get the innertext attribute and then replace every child of the anchor tag with an empty value.

So, either do this

$e->title;

or this

$text = $e->innertext;
foreach ($e->children() as $child)
{
    $text = str_replace($child, '', $text);
}

Though, it might be a good idea to use DOMDocument instead for this.

0人赞添加讨论(0) 举报

我欲成王，谁敢阻挡

4楼-- · 2020-05-06 10:53

Why not DOMDocument and get title attribute?:

$string = '<ul class="listings">
<li>
<a href="http://watchseries.eu/serie/1,000_places_to_see_before_you_die" title="1,000 Places To See Before You Die">
1,000 Places To See Before You Die
<span class="epnum">2009</span>
</a>
</li>';

$dom = new DOMDocument;
$dom->loadHTML($string);
$xpath = new DOMXPath($dom);
$text = $xpath->query('//ul[@class="listings"]/li/a/@title')->item(0)->nodeValue;
echo $text;

$text = explode("\n", trim($xpath->query('//ul[@class="listings"]/li/a')->item(0)->nodeValue));
echo $text[0];

Codepad Example

0人赞添加讨论(0) 举报

狗以群分

5楼-- · 2020-05-06 10:56

First of all check your html. Now it is like

  $string = '<ul class="listings">
               <li>
                  <a href="http://watchseries.eu/serie/1,000_places_to_see_before_you_die" title="1,000 Places To See Before You Die">
 1,000 Places To See Before You Die
                    <span class="epnum">2009</span>
                 </a>
             </li>';

There is no close tag for ul, perhaps you missed it.

  $string = '<ul class="listings">
               <li>
                  <a href="http://watchseries.eu/serie/1,000_places_to_see_before_you_die" title="1,000 Places To See Before You Die">
 1,000 Places To See Before You Die
                    <span class="epnum">2009</span>
                 </a>
             </li>
            </ul>';

Try like this

 $xml = simplexml_load_string($string);
 echo $xml->li->a['title'];

0人赞添加讨论(0) 举报

不美不萌又怎样

6楼-- · 2020-05-06 11:02

Use plaintext instead.

echo $e->plaintext;

But still the year will be present which you can trim off using regexp.

Example from the documentation here:

$html = str_get_html("<div>foo <b>bar</b></div>");
$e = $html->find("div", 0);

echo $e->tag; // Returns: " div"
echo $e->outertext; // Returns: " <div>foo <b>bar</b></div>"
echo $e->innertext; // Returns: " foo <b>bar</b>"
echo $e->plaintext; // Returns: " foo bar"

0人赞添加讨论(0) 举报

Get the inner text using curl concept in php

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间