Using regular expressions to find img tags without

2019-03-09 11:06发布

问题:

I am going through a large website (1600+ pages) to make it pass Priority 1 W3C WAI. As a result, things like image tags need to have alt attributes.

What would be the regular expression for finding img tags without alt attributes? If possible, with a wee explanation so I can use to find other issues.

I am in an office with Visual Web Developer 2008. The Edit >> Find dialogue can use regular expressions.

回答1:

This is really tricky, because regular expressions are mostly about matching something that is there. With look-around trickery, you can do things like 'find A that is not preceded/followed by B', etc. But I think the most pragmatic solution for you wouldn't be that.

My proposal relies a little bit on your existing code not doing too crazy things, and you might have to fine-tune it, but I think it's a good shot, if you really want to use a RegEx-search for your problem.

So what I suggest would be to find all img tags, that can (but don't need to) have all valid attributes for an img-element. Whether that is an approach you can work with is for you to decide.

Proposal:

/<img\s*((src|align|border|height|hspace|ismap|longdesc|usemap|vspace|width|class|dir|lang|style|title|id)="[^"]"\s*)*\s*\/?>/

The current limitations are:

  1. It expects your attribute values to be delimited by double quotes,
  2. It doesn't take into account possible inline on*Event attributes,
  3. It doesn't find img elements with 'illegal' attributes.


回答2:

Building on Mr.Black and Roberts126 answers:

/(<img(?!.*?alt=(['"]).*?\2)[^>]*)(>)/

This will match an img tag anywhere in the code which either has no alt tag or an alt tag which is not followed by ="" or ='' (i.e. invalid alt tags).

Breaking it down:

(          : open capturing group
<img       : match the opening of an img tag
(?!        : open negative look-ahead
.*?        : lazy some or none to match any character
alt=(['"]) : match an 'alt' attribute followed by ' or " (and remember which for later)
.*?        : lazy some or none to match the value of the 'alt' attribute
\2)        : back-reference to the ' or " matched earlier
[^>]*      : match anything following the alt tag up to the closing '>' of the img tag
)          : close capturing group
(>)        : match the closing '>' of the img tag

If your code editor allows search and replace by Regex you can use this in combination with the replace string:

$1 alt=""$3

To find any alt-less img tags and append them with an empty alt tag. This is useful when using spacers or other layout images for HTML emails and the like.



回答3:

Here is what I just tried in my own environment with a massive enterprise code base with some good success (found no false positives but definitely found valid cases):

<img(?![^>]*\balt=)[^>]*?>

What's going on in this search:

  1. find the opening of the tag
  2. look for the absence of zero or more characters that are not the closing bracket while also …
  3. Checking for the absence of of a word that begins with "alt" ("\b" is there for making sure we don't get a mid-word name match on something like a class value) and is followed by "=", then …
  4. look for zero or more characters that are not the closing bracket
  5. find the closing bracket

So this will match:

<img src="foo.jpg" class="baltic" />

But it won't match either of these:

<img src="foo.jpg" class="baltic" alt="" />
<img src="foo.jpg" alt="I have a value.">


回答4:

This works in Eclipse:

<img(?!.*alt).*?>

I'm updating for Section 508 too!



回答5:

This worked for me.

^<img(?!.*alt).*$

This matches any string beginning with <img that doesn't contain any number of characters before an alt attribute. It even works for src="<?php echo $imagename; ?>" type of attributes.



回答6:

Simple and effective:

<img((?!\salt=).)*?

This regex works for find <img> tags missing the alt attribute.



回答7:

This is perfectly possible with following regEx:

<img([^a]|a[^l]|al[^t]|alt[^=])*?/>

Looking for something that isn't there, is rather tricky, but we can trick them back, by looking for a group that doesn't start with 'a', or an 'a' that doesn't get followed by an 'l' and so on.