JavaScript Regex Replace Width Attribute Matching

2019-08-08 20:50发布

I am using RegEx to match a narrower set of TinyMCE HTML from a textarea. Widths are too big, creating run-offs so I am making test code in JavaScript.

My question is why does $3 not only match "1000px" but also matches the rest of the document after the table tag?

<script language="javascript">
  // change table width
  function adjustTable(elem0,elem1) {
    // debugging, place results in div
    elem1.innerHTML = elem0.innerHTML.replace(/^(.*)(\u003Ctable.*?\s*?\w*?width\u003D[\u0022\u0027])(\d+px)([\u0022\u0027].*?\u003E)(.*)$/img,"$3");
  }
</script>

<button type="button" onclick="adjustTable(document.getElementById('myTable'),document.getElementById('myResult'))">RegEx</button>

<div id="myTable">
  <table width="1000px">
    <thead>
      <tr><th colspan="3">Table Header</th></tr>
    </thead>
    <tbody>
      <tr><td>alpha</td><td>beta</td><td>gamma</td></tr>
    </tbody>
  </table>
</div>
<textarea id="myResult">
</textarea>

Yes, I do understand RegEx and HTML are streams that should not be crossed, because HTML is complex, etc. I am attempting to make the subset of HTML printable.

I do not see how it matches in multiple ways.

Below is the result for $3.

1000px
        <thead>
          <tr><th colspan="3">Table Header</th></tr>
        </thead>
        <tbody>
          <tr><td>alpha</td><td>beta</td><td>gamma</td></tr>
        </tbody>
      </table>

It matches the 1000px, but then there's the extraneous stuff after the table tag, which is odd, because I thought I was forcing a match in the table tag. Thoughts?

2条回答
Evening l夕情丶
2楼-- · 2019-08-08 20:56

The dot doesn't match linebreak characters in JavaScript. And since you set the /m modifier, the $ also matches at the end of lines, not just the end of the file.

Therefore, the final (.*) in your regex doesn't match anything, leaving the rest of the string intact when you replace the match with $3 (which contains 1000px).

See it on regex101.com.

查看更多
甜甜的少女心
3楼-- · 2019-08-08 21:00

Let's debug this by logging the entire result of the regex:

  function adjustTable(elem0,elem1) {
    // debugging, place results in div
    console.log ( (/^(.*)(\u003Ctable.*?\s*?\w*?width\u003D[\u0022\u0027])(\d+px)([\u0022\u0027].*?\u003E)(.*)$/img).exec(elem0.innerHTML) );
  }

The output is:

[
0: "  <table width="1000px">"
1: "  "
2: "<table width=""
3: "1000px"
4: "">"
5: ""
index: 1
input: "↵  <table width="1000px">↵    <thead>↵      <tr><th colspan="3">Table Header</th></tr>↵    </thead>↵    <tbody>↵      <tr><td>alpha</td><td>beta</td><td>gamma</td></tr>↵    </tbody>↵  </table>↵"
]

So if you want to get the result "1000px", then use this code:

(/^(.*)(\u003Ctable.*?\s*?\w*?width\u003D[\u0022\u0027])(\d+px)([\u0022\u0027].*?\u003E)(.*)$/img).exec(elem0.innerHTML)[3]
查看更多
登录 后发表回答