Close all HTML unclosed IMG tags

2020-02-26 18:28发布

问题:

Is it possible to do a regex replace on all IMG tags that are unclosed? If so, how would I identify:

  <img src="..." alt="...">

...as a potential canidate to be replaced?

   = <img src="..." alt="..."/>

Update: We have hundreds of pages, and thousands of image tags, all must of which must be closed. I'm not stuck on RegEx -- any other method, aside from manually updating all IMG tags, would suffice.

回答1:

(<img[^>]+)(?<!/)>

will match an img tag that is not properly closed. It requires that the regex flavor you're using supports lookbehind (which Ruby and JavaScript don't but most others do). Backreference no. 1 will contain the match, so if you search for this regex and replace by \1/> you should be good to go.

If you need to account for the possibility of > inside attributes, you could use

(<img("[^"]*"|[^>])+)(?<!/)>

This will match, e.g.,

<img src="image.gif" alt="hey, look--->">
<img src="image/image.gif">

and leave

<img src="image/image.gif" />

alone.



回答2:

In HTML the end tag for an <img> "must be omitted", so the start tag closes the element and you can't have an unclosed img.

If you want to convert your HTML to XHTML then use a real parser. Regular Expressions aren't a very good tool for this job.



回答3:

What exactly do you mean by "unclosed"?

 <img src="a1.jpg    <--no ending quotes and end parens
 <img src="a1.jpg"   <-- no end parens
 <img src="a1.jpg">  <-- the tag does not self-close as should be done in XHTML

You can try to intelligently find such suspects, but you are never guaranteed to be fool-proof.



回答4:

I have never tried this but a closed img tag is a tag beginning with <img with stuffs in and a /> at the end.

Here is something I tried in perl

!/usr/bin/env perl

my @images = ('<img src="toto.jpg">',
          '<img src="truc/machin.jpg" title="pouet" >',
          '<img        src="pouet.jpg" alt="toto" />',
          '<img src="math/a-greater-than-b.png" alt="a > b">');

foreach (@images) {
    if (/<img\s+(([a-z]+=".*?")+\s*)>/) {
    print "Match : <img $1 />\n";
    }
}

Produces:

Match : <img src="toto.jpg" />
Match : <img src="truc/machin.jpg" title="pouet"  />
Match : <img src="math/a-greater-than-b.png" alt="a > b" />


标签: regex xhtml