PHP DOMDocument how to get element?

2020-04-19 06:34发布

I am trying to read a website's content but i have a problem i want to get images, links these elements but i want to get elements them selves not the element content for instance i want to get that: i want to get that entire element.

How can i do this..

<?php

    $ch = curl_init();

    curl_setopt($ch, CURLOPT_URL, "http://www.link.com");
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

    $output = curl_exec($ch);

    $dom = new DOMDocument;
    @$dom->loadHTML($output);

    $items = $dom->getElementsByTagName('a');

    for($i = 0; $i < $items->length; $i++) {
        echo $items->item($i)->nodeValue . "<br />";
    }

    curl_close($ch);;
?>

2条回答
爷的心禁止访问
2楼-- · 2020-04-19 07:02

I'm assuming you just copy-pasted some example code and didn't bother trying to learn how it actually works...

Anyway, the ->nodeValue part takes the element and returns the text content (because the element has a single text node child - if it had anything else, I don't know what nodeValue would give).

So, just remove the ->nodeValue and you have your element.

查看更多
对你真心纯属浪费
3楼-- · 2020-04-19 07:04

You appear to be asking for the serialized html of a DOMElement? E.g. you want a string containing <a href="http://example.org">link text</a>? (Please make your question clearer.)

$url = 'http://example.com';
$dom = new DOMDocument();
$dom->loadHTMLFile($url);

$anchors = $dom->getElementsByTagName('a');

foreach ($anchors as $a) {
    // Best solution, but only works with PHP >= 5.3.6
    $htmlstring = $dom->saveHTML($a);

    // Otherwise you need to serialize to XML and then fix the self-closing elements
    $htmlstring = saveHTMLFragment($a);
    echo $htmlstring, "\n";
}


function saveHTMLFragment(DOMElement $e) {
    $selfclosingelements = array('></area>', '></base>', '></basefont>',
        '></br>', '></col>', '></frame>', '></hr>', '></img>', '></input>',
        '></isindex>', '></link>', '></meta>', '></param>', '></source>',
    );
    // This is not 100% reliable because it may output namespace declarations.
    // But otherwise it is extra-paranoid to work down to at least PHP 5.1
    $html = $e->ownerDocument->saveXML($e, LIBXML_NOEMPTYTAG);
    // in case any empty elements are expanded, collapse them again:
    $html = str_ireplace($selfclosingelements, '>', $html);
    return $html;
}

However, note that what you are doing is dangerous because it could potentially mix encodings. It is better to have your output as another DOMDocument and use importNode() to copy the nodes you want. Alternatively, use an XSL stylesheet.

查看更多
登录 后发表回答