I'm using QueryPath and PHP.
This finds the .eventdate okay, but doesn't return anything for .dtstart:
$qp = htmlqp($url);
foreach ($qp->find('table#schedule')->find('tr') as $tr){
echo 'date: ';
echo $tr->find('.eventdate')->text();
echo ' time: ';
echo $tr->find('.dtstart')->text();
echo '<br>';
}
If I swap the two, .dtstart works okay, but .eventdate doesn't return anything. Thus, it seems that find() in querypath destroys the element and only returns the value it needs, making iteration over $tr not possible to search for multiple items.
Here's example HTML for a TR I'm dealing with:
<tr class="event"><th class="date first" scope="row"><abbr class="eventdate" title="Thursday, February 01, 2011" >02/01</abbr><span class="eventtime" ><abbr class="dtstart" title="2012-02-01T19:00:00" >7:00 PM</abbr><abbr class="dtend" title="2012-02-01T21:00:00" >9:00 PM</abbr></span></th><td class="opponent summary"><ul><li class="first">@ <a class="team" href="/high-schools/ridge-wolves/basketball-winter-11-12/schedule.htm" >Ridge </a> <span class="game-note">*</span></li><li class="location" title="Details: Ridge High School">Details: Ridge High School</li><li class="last"><a class="" href="/local/stats/pregame.aspx?contestid=4255-4c6c-906d&ssid=381d-49f5-9f6d" >Preview Game</a></li></ul></td><td class="result last"><a class="pregame" href="/local/stats/pregame.aspx?contestid=4255-4c6c-906d&ssid=381d-49f5-9f6d">Preview</a></td></tr>
I tried copying the $tr before the first find and replacing it before the second, but that didn't work.
How can I search during each $tr for certain variables?
FYI, beyond .eventdate and .dtstart, I also want the .opponent, href under the a
for the opponent and the a
anchor text.
QueryPath maintains its state internally (unlike jQuery) for performance reasons. So
branch()
is the way to go.As a modification to the proposed solution, though, I would suggest minimizing the number of find() calls by doing this:
Finally, any time you do a "destructive" action (like a
find()
), you can always go back one step usingend()
. So the above could also be done like this:This is a VERY VERY minor performance improvement, but I prefer the
branch()
method unless I'm working with documents larger than 1M.In QueryPath 3.x, which has a whole bunch of new performance enhancements, I am toying with the idea of going with the jQuery way of creating a new object for each function. Unfortunately, this method will use a LOT more memory, so I may not keep it. While
branch()
takes a little while to learn, it does have its advantages.I'm just learning QueryPath myself, but I think you should branch the row object. Otherwise the
$tr->find('.eventdate')
will take you to theabbr
element contained in the row, and each followingfind()
will try to find elements beneath theabbr
, resulting in no matches.branch()
(see documentation) creates a copy of the QueryPath object, leaving the original object (in this case$tr
) intact.So your code would be:
I don't know if this is the preferred way to achieve what you want, but it seems to work.
yeah you are right, I actually had this problem today, in jquery, you just query, query, query, query no problems, however QueryPath if you query, it changes the internal "state" of the object so if you attempt a second query, it's applied against the current state.
so if you want to query multiple "separate" locations in the document, you have to branch before
$q = qp("something.html);
$a = $q->branch()->find("tr");
$b = $q->branch()->find("a");
that seems to work in my code, so I suppose it will work in yours.