制作一个URL正则表达式全球(Making a url regex global)

2019-09-22 06:55发布

我一直在寻找一个正则表达式来代替纯文本链接在一个字符串(字符串可以包含超过1个URL),通过:

 <a href="url">url</a>

我发现这一点: http://mathiasbynens.be/demo/url-regex

我想用diegoperini的regex(根据测试,这是最好的):

_^(?:(?:https?|ftp)://)(?:\S+(?::\S*)?@)?(?:(?!10(?:\.\d{1,3}){3})(?!127(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[a-z\x{00a1}-\x{ffff}]{2,})))(?::\d{2,5})?(?:/[^\s]*)?$_iuS

但我想Ø让全球替换所有的URL字符串中。 当我使用这个:

/_(?:(?:https?|ftp)://)(?:\S+(?::\S*)?@)?(?:(?!10(?:\.\d{1,3}){3})(?!127(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[a-z\x{00a1}-\x{ffff}]{2,})))(?::\d{2,5})?(?:/[^\s]*)?_iuS/g

它不工作,我怎么做这个表达式全球又是什么在开始和“_iuS”,到了最后的底线,意味着什么?

我想,所以我使用的PHP使用它:

preg_replace($regex, '<a href="$0">$0</a>', $examplestring);

Answer 1:

下划线是正则表达式的分隔符,所述I,U和S是图案改性剂:

I(PCRE_CASELESS)

 If this modifier is set, letters in the pattern match both upper and lower case letters. 

U(PCRE_UNGREEDY)

 This modifier inverts the "greediness" of the quantifiers so that they are not greedy by default, but become greedy if followed by ?. It is not compatible with Perl. It can also be set by a (?U) modifier setting within the pattern or by a question mark behind a quantifier (eg .*?). 

小号

 When a pattern is going to be used several times, it is worth spending more time analyzing it in order to speed up the time taken for matching. If this modifier is set, then this extra analysis is performed. At present, studying a pattern is useful only for non-anchored patterns that do not have a single fixed starting character. 

欲了解更多信息请参阅http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php

当你添加/ ... /克,你加入另一个正则表达式的分隔符加修饰摹至极的PCRE不存在,这就是为什么它没有工作。



Answer 2:

我同意@verdesmarald和使用下面的函数这个模式:

$string = preg_replace_callback(
        "_(?:(?:https?|ftp)://)(?:\S+(?::\S*)?@)?(?:(?!10(?:\.\d{1,3}){3})(?!127(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[a-z\x{00a1}-\x{ffff}]{2,})))(?::\d{2,5})?(?:/[^\s]*)?_iuS",
        create_function('$match','
            $m = trim(strtolower($match[0]));
            $m = str_replace("http://", "", $m);
            $m = str_replace("https://", "", $m);
            $m = str_replace("ftp://", "", $m);
            $m = str_replace("www.", "", $m);

            if (strlen($m) > 25)
            {
                $m = substr($m, 0, 25) . "...";
            }

            return "<a href=\"$match[0]\">$m</a>";
                '), $string);

    return $string;

这似乎这样的伎俩,并解决我有一个问题。 作为@verdesmarald说,去掉^和$字符允许的模式在我pre_replace_callback甚至工作()。

只有我关注的事情,是多么有效的模式。 如果在一个繁忙/高流量的web应用程序中使用,可能会导致一个瓶颈?

UPDATE

如果有在URL的路径部分的端部的小径点,像这样上述正则表达式模式打破http://www.mydomain.com/page. 。 为了解决这个我通过添加改性的正则表达式模式的最后部分^. 做最后的部分看起来像这样[^\s^.] 当我读它,不匹配尾随空格或点。

在我的测试中,到目前为止,似乎是工作的罚款。



文章来源: Making a url regex global
标签: php regex url