Regex replace text outside html tags

2019-01-03 03:24发布

I have this HTML:

"This is simple html text <span class='simple'>simple simple text text</span> text"

I need to match only words that are outside any HTML tag. I mean if I want to match “simple” and “text” I should get the results only from “This is simple html text” and the last part “text”—the result will be “simple” 1 match, “text” 2 matches. Could anyone help me with this? I’m using jQuery.

var pattern = new RegExp("(\\b" + value + "\\b)", 'gi');

if (pattern.test(text)) {
    text = text.replace(pattern, "<span class='notranslate'>$1</span>");
}
  • value is the word I want to match (in this case “simple”)
  • text is "This is simple html text <span class='simple'>simple simple text text</span> text"

I need to wrap all selected words (in this example it is “simple”) with <span>. But I want to wrap only words that are outside any HTML tags. The result of this example should be

This is <span class='notranslate'>simple</span> html <span class='notranslate'>text</span> <span class='simple'>simple simple text text</span> <span class='notranslate'>text</span>

I do not want replace any text inside

<span class='simple'>simple simple text text</span>

It should be the same as before replacement.

2条回答
姐就是有狂的资本
2楼-- · 2019-01-03 03:36

Okay, try using this regex:

(text|simple)(?![^<]*>|[^<>]*</)

Example worked on regex101.

Breakdown:

(         # Open capture group
  text    # Match 'text'
|         # Or
  simple  # Match 'simple'
)         # End capture group
(?!       # Negative lookahead start (will cause match to fail if contents match)
  [^<]*   # Any number of non-'<' characters
  >       # A > character
|         # Or
  [^<>]*  # Any number of non-'<' and non-'>' characters
  </      # The characters < and /
)         # End negative lookahead.

The negative lookahead will prevent a match if text or simple is between html tags.

查看更多
Anthone
3楼-- · 2019-01-03 03:58
^([^<]*)<\w+.*/\w+>([^<]*)$

However this is a very naive expression. It would be better to use a DOM parser.

查看更多
登录 后发表回答