Using Dom Crawler to get only text (without tag).
$html = EOT<<<
<div class="coucu">
Get Description <span>Coucu</span>
</div>
EOT;
$crawler = new Crawler($html);
$crawler = $crawler->filter('.coucu')->first()->text();
output: Get Description Coucu
I want to output (only): Get Description
UPDATE:
I found a solution for this: (but it's really bad solution)
...
$html = $crawler->filter('.coucu')->html();
// use strip_tags_content in https://php.net/strip_tags
$html = strip_tags_content($html,'span');
Ran into the same situation. I ended up going with:
$html = $crawler->filter('.coucu')->html();
$html = explode("<span", $html);
echo trim($html[0]);
Based on the criteria within your question, I think you would be best served by modifying your CSS Selector to: $crawler = $crawler->filter('div.coucu > span')
From there you can go $span_text = $crawler->text();
or to simplify things: $text = $crawler->filter('div.coucu > span')->text();
The text() method returns the value of the first item within the list.
The HTML-removing solution it's based on regexes to strip HTML away (bad idea Using regular expressions to parse HTML: why not?), and the explode solution is limited.
I came up going by difference: get all the text, then remove the non-own text with str_replace
.
function extractCurrentText(Crawler $crawler)
{
$clone = new Crawler();
$clone->addHTMLContent("<body><div>" . $crawler->html() . "</div></body>", "UTF-8");
$clone->filter("div")->children()->each(function(Crawler $child) {
$node = $child->getNode(0);
$node->parentNode->removeChild($node);
});
return $clone->text();
}