正则表达式PHP：找到DIV一切(regex php: find everything in div

我试图找到使用正则表达式一个div里面eveything。我知道，有可能是这样做一个聪明的办法 - 但我选择正则表达式。

所以目前我的正则表达式模式是这样的：

$gallery_pattern = '/<div class="gallery">([\s\S]*)<\/div>/';

而且它的伎俩 - 有点。

问题是，如果我有后对方两个div - 这样。

<div class="gallery">text to extract here</div>
<div class="gallery">text to extract from here as well</div>

我想提取从两个div的信息，但我的问题，测试时，是作为一个结果，而是之间的即时消息没有得到文字：

"text to extract here </div>  
<div class="gallery">text to extract from here as well"

所以总结起来。它跳过div的第一端。并继续到下一个。在div内的文本可以包含< ， /和换行符。只是让你知道！

有没有人有一个简单的解决这个问题呢？林仍然是一个正则表达式的新手。

Answer 1:

什么是这样的：

$str = <<<HTML
<div class="gallery">text to extract here</div>
<div class="gallery">text to extract from here as well</div>
HTML;

$matches = array();
preg_match_all('#<div[^>]*>(.*?)</div>#', $str, $matches);

var_dump($matches[1]);

注意“？” 在正则表达式，所以它是“不贪”。

这将让你：

array
  0 => string 'text to extract here' (length=20)
  1 => string 'text to extract from here as well' (length=33)

这应该很好地工作。如果你没有鳞片状的div; 如果你这样做嘛......实际上是：你真的确定要使用的合理表达式解析HTML，这是相当不理性本身？

Answer 2:

你不应该使用正则表达式解析HTML时，有一个方便的DOM库：

 $str = ' <div class="gallery">text to extract here</div> <div class="gallery">text to extract from here as well</div> '; $doc = new DOMDocument(); $doc->loadHTML($str); $divs = $doc->getElementsByTagName('div'); if ( count($divs ) ) { foreach ( $divs as $div ) { echo $div->nodeValue . '<br>'; } }