可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

There's many regex's out there to match a URL. However, I'm trying to match URLs that do not appear anywhere within a <a> hyperlink tag (HREF, inner value, etc.). So NONE of the URLs in these should match:

<a href="http://www.example.com/">something</a>
<a href="http://www.example.com/">http://www.example2.com</a>
<a href="http://www.example.com/"><b>something</b>http://www.example.com/<span>test</span></a>

Any URL outside of <a></a> should be matched.

One approach I tried was to use a negative lookahead to see if the first <a> tag after the URL was an opening <a> or a closing </a>. If it is a closing </a> then the URL must be inside a hyperlink. I think this idea was okay, but the negative lookahead regex didn't work (or more accurately, the regex wasn't written correctly). Any tips are very appreciated.

回答1:

You can do it in two steps instead of trying to come up with a single regular expression:

Blend out (replace with nothing) the HTML anchor part (the entire anchor tag: opening tag, content and closing tag).
Match the URL

In Perl it could be:

my $curLine = $_; #Do not change $_ if it is needed for something else.
$curLine =~ /<a[^<]+<\/a>//g; #Remove all of HTML anchor tag, "<a", "</a>" and everything in between.
if ( $curLine =~ /http:\/\//)
{
  print "Matched an URL outside a HTML anchor !: $_\n";
}

回答2:

You can do that using a single regular expression that matches both anchor tags and hyperlinks:

# Note that this is a dummy, you'll need a more sophisticated URL regex
regex = '(<a[^>]+>)|(http://.*)'

Then loop over the results and only process matches where the second sub-pattern was found.

回答3:

Peter has a great answer: first, remove anchors so that

Some text <a href="http://page.net">TeXt</a> and some more text with link http://a.net

is replaced by

Some text  and some more text with link http://a.net

THEN run a regexp that finds urls:

http://a.net

回答4:

Use the DOM to filter out the anchor elements, then do a simple URL regex on the rest.

Regular expression to find URLs not inside a hyper

问题:

回答1:

回答2:

回答3:

回答4:

收藏的人(0)

Regular expression to find URLs not inside a hyper

问题:

回答1:

回答2:

回答3:

回答4:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮