I have looked at many questions here (and many more websites) and some provided hints but none gave me a definitive answer. I know regular expressions but I am far from being a guru. This particular question deals with regex in PHP.
I need to locate words in a text that are not surrounded by a hyperlink of a given class. For example, I might have
This <a href="blabblah" class="no_check">elephant</a> is green and this elephant is blue while this <a href="blahblah">elephant</a> is red.
I would need to match against the second and third elephants but not the first (identified by test class "no_check"). Note that there could more attributes than just href and class within hyperlinks. I came up with
((?<!<a .*class="no_check".*>)\belephant\b)
which works beautifully in regex test software but not in PHP.
Any help is greatly appreciated. If you cannot provide a regular expression but can find some sort of PHP code logic that would circumvent the need for it, I would be equally grateful.
I think the simplest approach would be to match either a complete
<a>
element with a "no_check" attribute, or the word you're searching for. For example:If it was the word you matched, it will be in capture group #1; if not, that group should be empty or null.
Of course, by "simplest approach" I really meant the simplest regex approach. Even simpler would be to use an HTML parser.
I ended up using a mixed solution. It turns out that I had to parse a text for specific keywords and check if they were already part of a link and if not add them to a hyperlink. The solutions provided here were very interesting but not exactly tailored enough for what I needed.
The idea of using an HTML parser was a good one though and I am currently using one in another project. So hats off to both Alan Moore and Eric Strom for suggesting that solution.
If variable width negative look-behind is not available a quick and dirty solution is to reverse the string in memory and use variable width negative look-ahead instead. then reverse the string again.
But you may be better off using an HTML parser.