Regex Tables how to match?

I like to return the strings in this table

<tr class="rowodd" onclick="window.location.href='/portal/offers/show/entityId/32114';">
  <td>01.10.2009</td>
   <td>AN09551</td>
     <td>[2009132] Ich bin Un.&nbsp;<a href="/portal/clients/show/entityId/762350"><myimsrc="/img/bullet_go.pngs" alt="" title="Kundenakte aufrufen"></a></td>
   <td class="number" title="7.500,00Â&nbsp;â‚¬">7.500,00Â&nbsp;</td>
    <td>Entwurf</td>
     </tr>

I tryed Also this:

#<tr>.*?<t.*?>(.*?)</t.*?>.*?<t.*?>(.*?)</t.*?>.*?<t.*?>(.*?)</t.*?>.*?</tr>#s

can anyone help?

标签： php html regex parsing

6条回答

We Are One

2楼-- · 2019-06-10 01:53

Try:

// http://simplehtmldom.sourceforge.net/
include('simple_html_dom.php');
$str = '<tr class="rowodd" onclick="window.location.href=\'/portal/offers/show/entityId/32114\';">
  <td>
    01.10.2009
  </td>
  <td>
    AN09551
  </td>
  <td>
    [2009132] Ich bin Un. <a href="/portal/clients/show/entityId/762350">
    <myimsrc="/img/bullet_go.pngs" alt="" title="Kundenakte aufrufen"></a>
  </td>
  <td class="number" title="7.500,00">
    7.500,00
  </td>
  <td>
    Entwurf
  </td>
</tr>';
$html = str_get_html($str);
foreach($html->find('td') as $element) {
  echo trim($element->innertext) . "\n";
}

Output:

01.10.2009
AN09551
[2009132] Ich bin Un. <a href="/portal/clients/show/entityId/762350">
    <myimsrc="/img/bullet_go.pngs" alt="" title="Kundenakte aufrufen"></a>
7.500,00
Entwurf

0人赞添加讨论(0) 举报

仙女界的扛把子

3楼-- · 2019-06-10 01:54

Otherwise with a regexp you could use this (with multi-line option):

(?:\<td[^\>]*?\>([^\<]*?)\</td\>)+

But as pointed out by @Brian Agnew, this is just nowhere as good as an xml/html parser...

0人赞添加讨论(0) 举报

祖国的老花朵

4楼-- · 2019-06-10 01:54

In PHP world, there's preg_match_all which makes it much easier than do in JS.

$ptn = "/<\s*td[^>]*>([^<^>]*)</;
preg_match_all($ptn, $str, $matches);
print_r($matches);

Test the result in Preg Tester

0人赞添加讨论(0) 举报

Emotional °昔

5楼-- · 2019-06-10 02:02

Don’t use that many inexplicit non-greedy expressions like .*?. Though they do what you want, they come with a lot of backtracking and thus make your whole expression inefficient. Especially when you use so many of them.

Try to be as explicit as possible:

#<tr\b(?:[^"'>]*|"[^"]*"|'[^']*')*>\s*
    <td\b(?:[^"'>]*|"[^"]*"|'[^']*')*>((?:[^<]|(?!</td\s*>)<)*)</td\s*>\s*
    <td\b(?:[^"'>]*|"[^"]*"|'[^']*')*>((?:[^<]|(?!</td\s*>)<)*)</td\s*>\s*
    <td\b(?:[^"'>]*|"[^"]*"|'[^']*')*>((?:[^<]|(?!</td\s*>)<)*)</td\s*>\s*
    <td\b(?:[^"'>]*|"[^"]*"|'[^']*')*>((?:[^<]|(?!</td\s*>)<)*)</td\s*>\s*
    <td\b(?:[^"'>]*|"[^"]*"|'[^']*')*>((?:[^<]|(?!</td\s*>)<)*)</td\s*>\s*
</tr\s*>#sx

But as you see, this is a mess.

You should better use an HTML parser like the one of DOMDocument. Then you can query the elements with XPath as Brian Agnew suggested. That’s way more reliable and comfortable than regular expressions.

0人赞添加讨论(0) 举报

家丑人穷心不美

6楼-- · 2019-06-10 02:10

As numerous people will/have pointed out, you're much better off using an HTML/XML parser for the above (like this one). HTML isn't regular and there are numerous edge cases to code around if you use a regular expression.

Given that you just want to extract the text, perhaps XPath will help. An expression such as:

/tr/td/text()

may do the trick.

0人赞添加讨论(0) 举报

beautiful°

7楼-- · 2019-06-10 02:17

isn’t strip_tags an option?

it will strip all tags and only leave the text between the tags. it strips attributes too though

in your case this would result in:

  01.10.2009
   AN09551
     [2009132] Ich bin Un. 
   7.500,00 € 
    Entwurf

0人赞添加讨论(0) 举报

Regex Tables how to match?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间