Word Boundary Regular Expression Unless Inside HTM

2019-08-10 03:52发布

问题:

I have a regular expression using word boundaries that works exceedingly well...

~\b('.$value.')\b~i

...save for the fact that it matches text inside HTML tags (i.e. title="This is blue!"). It's a problem because I'm doing text substitution on anything the regex matches, then making tooltips appear using those title tags. So, as you can imagine, it's substituting text inside the title and breaking the HTML of the tooltip. For example, what should be:

<span class="blue" title="This is blue!">Aqua</span>

...ends up becoming...

<span class="blue" title="This is <span class=" blue"="">Royal Blue</span>"&gt;Aqua</span>

My use of strip_tags didn't solve the issue; I think what I need is a better regular expression which simply will not match content ending in blue"> ('blue' in this case being placeholder for any other color in the array I'm comparing it against).

Can anyone append what I need to the regular expression? Or do you have a better solution?

回答1:

Regex replaces often seem like the solution but they can have a lot of ill side-effects, and not really accomplish what you want. Look into DOMDocument models instead (as some commenters have suggested).

But if you insist on using regex, here's a good post on SO. It uses two passes to accomplish what you want.



回答2:

Davey, resurrecting this question because apart from the Dom solution, there is a better regex solution than the one mentioned so far. It's a simple solution that requires a single step.

The general solution is

<[^>]*>(*SKIP)(*F)|blue

Here's a demo

Any content within <> tags is simply skipped. Content in between tags, such as blue is matched, which sounds like it fits your needs.

In the expression, replace "blue" for what you like.

Reference

  1. How to match pattern except in situations s1, s2, s3
  2. How to match a pattern unless...