I like to return the strings in this table
<tr class="rowodd" onclick="window.location.href='/portal/offers/show/entityId/32114';">
<td>01.10.2009</td>
<td>AN09551</td>
<td>[2009132] Ich bin Un. <a href="/portal/clients/show/entityId/762350"><myimsrc="/img/bullet_go.pngs" alt="" title="Kundenakte aufrufen"></a></td>
<td class="number" title="7.500,00 €">7.500,00 </td>
<td>Entwurf</td>
</tr>
I tryed Also this:
#<tr>.*?<t.*?>(.*?)</t.*?>.*?<t.*?>(.*?)</t.*?>.*?<t.*?>(.*?)</t.*?>.*?</tr>#s
can anyone help?
Try:
Output:
Otherwise with a regexp you could use this (with multi-line option):
But as pointed out by @Brian Agnew, this is just nowhere as good as an xml/html parser...
In PHP world, there's preg_match_all which makes it much easier than do in JS.
Test the result in Preg Tester
Don’t use that many inexplicit non-greedy expressions like
.*?
. Though they do what you want, they come with a lot of backtracking and thus make your whole expression inefficient. Especially when you use so many of them.Try to be as explicit as possible:
But as you see, this is a mess.
You should better use an HTML parser like the one of DOMDocument. Then you can query the elements with XPath as Brian Agnew suggested. That’s way more reliable and comfortable than regular expressions.
As numerous people will/have pointed out, you're much better off using an HTML/XML parser for the above (like this one). HTML isn't regular and there are numerous edge cases to code around if you use a regular expression.
Given that you just want to extract the text, perhaps XPath will help. An expression such as:
may do the trick.
isn’t
strip_tags
an option?it will strip all tags and only leave the text between the tags. it strips attributes too though
in your case this would result in: