Using multiple find in foreach with QueryPath

2020-07-23 05:51发布

I'm using QueryPath and PHP.

This finds the .eventdate okay, but doesn't return anything for .dtstart:

$qp = htmlqp($url);
foreach ($qp->find('table#schedule')->find('tr') as $tr){
    echo 'date: ';
    echo $tr->find('.eventdate')->text();
    echo ' time: ';
    echo $tr->find('.dtstart')->text();
    echo '<br>';
}

If I swap the two, .dtstart works okay, but .eventdate doesn't return anything. Thus, it seems that find() in querypath destroys the element and only returns the value it needs, making iteration over $tr not possible to search for multiple items.

Here's example HTML for a TR I'm dealing with:

<tr class="event"><th class="date first" scope="row"><abbr class="eventdate" title="Thursday, February 01, 2011" >02/01</abbr><span class="eventtime" ><abbr class="dtstart" title="2012-02-01T19:00:00" >7:00 PM</abbr><abbr class="dtend" title="2012-02-01T21:00:00" >9:00 PM</abbr></span></th><td class="opponent summary"><ul><li class="first">@ <a class="team" href="/high-schools/ridge-wolves/basketball-winter-11-12/schedule.htm" >Ridge </a> <span class="game-note">*</span></li><li class="location" title="Details: Ridge High School">Details: Ridge High School</li><li class="last"><a class="" href="/local/stats/pregame.aspx?contestid=4255-4c6c-906d&amp;ssid=381d-49f5-9f6d" >Preview Game</a></li></ul></td><td class="result last"><a class="pregame" href="/local/stats/pregame.aspx?contestid=4255-4c6c-906d&amp;ssid=381d-49f5-9f6d">Preview</a></td></tr>

I tried copying the $tr before the first find and replacing it before the second, but that didn't work.

How can I search during each $tr for certain variables?

FYI, beyond .eventdate and .dtstart, I also want the .opponent, href under the a for the opponent and the a anchor text.

标签: php querypath
3条回答
神经病院院长
2楼-- · 2020-07-23 06:27

QueryPath maintains its state internally (unlike jQuery) for performance reasons. So branch() is the way to go.

As a modification to the proposed solution, though, I would suggest minimizing the number of find() calls by doing this:

$qp = htmlqp($url);
foreach ($qp->find('table#schedule tr') as $tr){
    echo 'date: ';
    echo $tr->branch('.eventdate')->text();
    echo ' time: ';
    echo $tr->branch('.dtstart')->text();
    echo '<br>';
}

Finally, any time you do a "destructive" action (like a find()), you can always go back one step using end(). So the above could also be done like this:

$qp = htmlqp($url);
foreach ($qp->find('table#schedule tr') as $tr){
    echo 'date: ';
    echo $tr->find('.eventdate')->text();
    echo ' time: ';
    echo $tr->end()->find('.dtstart')->text();
    echo '<br>';
}

This is a VERY VERY minor performance improvement, but I prefer the branch() method unless I'm working with documents larger than 1M.

In QueryPath 3.x, which has a whole bunch of new performance enhancements, I am toying with the idea of going with the jQuery way of creating a new object for each function. Unfortunately, this method will use a LOT more memory, so I may not keep it. While branch() takes a little while to learn, it does have its advantages.

查看更多
贼婆χ
3楼-- · 2020-07-23 06:30

I'm just learning QueryPath myself, but I think you should branch the row object. Otherwise the $tr->find('.eventdate') will take you to the abbr element contained in the row, and each following find() will try to find elements beneath the abbr, resulting in no matches. branch() (see documentation) creates a copy of the QueryPath object, leaving the original object (in this case $tr) intact.

So your code would be:

$qp = htmlqp($url);
foreach ($qp->find('table#schedule')->find('tr') as $tr){
    echo 'date: ';
    echo $tr->branch()->find('.eventdate')->text();
    echo ' time: ';
    echo $tr->branch()->find('.dtstart')->text();
    echo '<br>';
}

I don't know if this is the preferred way to achieve what you want, but it seems to work.

查看更多
The star\"
4楼-- · 2020-07-23 06:47

yeah you are right, I actually had this problem today, in jquery, you just query, query, query, query no problems, however QueryPath if you query, it changes the internal "state" of the object so if you attempt a second query, it's applied against the current state.

so if you want to query multiple "separate" locations in the document, you have to branch before

$q = qp("something.html);
$a = $q->branch()->find("tr");
$b = $q->branch()->find("a");

that seems to work in my code, so I suppose it will work in yours.

查看更多
登录 后发表回答