PHP Simple HTML DOM Parser: Select only DIVs with

2019-02-09 06:24发布

问题:

I was searching like mad and found no solution. The problem is simple.

Let's say I have 3 DIVs:

<div class="class1">
  <div class="subclass"> TEXT1 </div>
</div>

<div class="class2">
  <div class="subclass"> TEXT2 </div>
</div>

<div class="class1 class2">
  <div class="subclass"> TEXT3 </div>
</div>

So, very simple. I just want to find the TEXT3, which has BOTH class1 and class2. Using Simple HTML DOM Parser, I can't seem to get it to work.

Here's what I tried:

foreach($html->find("[class=class1], [class=class2]") as $item) {
$items[] =  $item->find('.subclass', 0)->plaintext;
}

The problem is, with

find("[class=class1], [class=class2]")

it's finding all of them, as the comma is like an OR, if I leave the comma, it's looking for nested class2 inside class1. I am just looking for an AND...

EDIT

Thanks to 19greg96 I found out that

div[class=class1 class2]

works, the problem is that it looks for exactly those two in that order. Let's say I have

<div class="class1 class2">
  <div class="subclass"> TEXT3 </div>
</div>

then it works, and if I have

<div class="class1 class2 class3">
  <div class="subclass"> TEXT3 </div>
</div>

it works when I put an asterix, as it looks for the substring:

div[class*=class1 class2]

PROBLEM

I know only that class1 and class3 is there, but maybe others and in random order. That still doesn't work. Any idea how to just look for A & B in any random order? So that

div[class=class1 class3]

works with that example?

回答1:

EDIT2: As this is a bug in the dom parser (tested on version 1.5), there is no simple way of doing this. Solution I could think of:

$find = $html->find(".class1");
$ret = array();
foreach ($find as $element) {
    if (strpos($element->class, 'class3') !== false) {
        $ret[] = $element;
    }
}
$find = $ret;

basically you find all the elements with class one than iterate through those elements to find the ones that have class two (in this case three).


Previous answer:

Simple answer (should work according to html spec):

find(".class1.class2")

this will look for any type of element (div,img,a etc..) that has both class1 and class2. If you want to specify the type of element to match add it to the beginning without a . like:

find("div.class1.class2")

If you have a space between the two specified classes it will match elements with both the classes or elements nested in the element with the first class:

find(".class1 .class2")

will match

<div class="class1">
  <div class="class2">this will be returned</div>
</div>

or

<div class="class1 class2">this will be returned</div>

edit: I tried your code and found that the solutions above do not work. The solution that does work however is as follows:

$html->find("div[class=class1 class2]")


回答2:

You can also try this :

test.html

<h1 class="first second last">
    <p>Paragraph</p>
</h1>

Solution :

include "simple_html_dom.php";

$html = file_get_html('test.html');
$h1 = $html->find('h1');
foreach ($h1 as $h1) {
    $h1Class = ($h1->class);
    if($h1Class == 'first second last'){
        $item['test'] = 'success';
    }else{
        $item['test'] = 'fail';
    }
    $ar[] = $item;
}
echo "<pre>";
print_r($ar);


回答3:

I had thought simple html dom let you do:

$html->find(".class1.class2")

But I guess not. You can switch to this library if you want that.



回答4:

$html->find(div[class=classname1], div[class=classname2]);

or

$html->find(div.classname1, div.classname2);