I'm using PHP and xPath to crawl into a website I own (just crawl the html not going into the server) but I get this error:
Catchable fatal error: Object of class
DOMNodeList could not be converted to
string in C:\wamp\www\crawler.php on
line 46
I already tried echoing just that line to see what I was getting but I would just get the same error also I tried googling for the error but I, in the end, ended up in the php documentation and found out my example is exactly as the one in php documentation except I'm working with an HTML instead of a XML...so I have no idea what's wrong...here's my code...
<?php
$html = file_get_contents('http://miurl.com/mipagina#0');
// create document object model
$dom = new DOMDocument();
// load html into document object model
@$dom->loadHTML($html);
// create domxpath instance
$xPath = new DOMXPath($dom);
// get all elements with a particular id and then loop through and print the href attribute
$elements = $xPath->query("//*[@class='nombrecomplejo']");
if ($elements != null) {
foreach ($elements as $e) {
echo parse_str($e);
}
}
?>
Edit
Actually yes sorry that line was to test when I had commented other stuff...I deleted it here still have the error though.
According to the documentation, the "$elements != null
" check is unnecessary. DOMXPath::query()
will always return a DOMNodeList
, though maybe it will be of zero length, which won't confuse the foreach
loop.
Also, note the use of the nodeValue
property to get the element's textual representation:
$elements = $xPath->query("//*[@class='nombrecomplejo']");
foreach ($elements as $e) {
echo $e->nodeValue;
}
The reason for the error you got is that you can't feed anything other than a string to parse_str()
, you tried passing in a DOMElement
.
Just a wild guess, but echo $elements; is line 46, right? I believe the echo command expects something that is a string or convertible to a string, which $elements is not. Try removing that line.
No specific answers here, just debugging tips.
First, remove the @ from
@$dom->loadHTML($html);
It may be that there's an warning you're supressing here that may help you debug the problem. The loadHTML method can't always deal with HTML that's poorly formed. In the example you posted, I got the following
PHP Warning: DOMDocument::loadHTML(): htmlParseEntityRef: expecting ';' in Entity, line: 109 in /Users/alanstorm/Desktop/foo.php on line 7
If you have the power to do so, install the tidy extension and use it to get a clean document.
Also, make sure that there's actually a string in $html. Since you're requesting a page over http, it may be that your IP is being blocked for some reason.
The DOMDocument family of classes/object can be tricky to work with if you're not used to dealing with fully, "hard-core" object oriented interfaces.
The two things you need to keep in mind here are
Almost everything returned by a DomDocument method is an object
Most of these objects can't be converted to a string
So, it looks like your code errors out when you try to convert a DOMNodeList to a string, which means $e is a NodeList instead of a node for some reason.
Try echoing out the $e->length instead to see if you have a nodelist of a particular length, or iterating over $e to figure out what's inside of it. You could also add an echo '.'; to your loop and then count the dots to ensure your Xpath query is returning something of a non-zero length.
My guess is your xpath query is returning an empty node list here. Download the Firefox xPath Checker and us it to run your xpath query on your HTML document. That will let you be confident your have the right xPath, and then you can concentrate on figuring out the PHP part. When I checked using your example page/code, I got an empty result.
Good luck!