PHP / RegEx - Convert URLs to links by detecting .

2019-01-20 02:01发布

I know there have been many questions asking for help converting URLs to clickable links in strings, but I haven't found quite what I'm looking for.

I want to be able to match any of the following examples and turn them into clickable links:

http://www.domain.com
https://www.domain.net
http://subdomain.domain.org
www.domain.com/folder
subdomain.domain.net
subdomain.domain.edu/folder/subfolder
domain.net
domain.com/folder

I do not want to match random.stuff.separated.with.periods.

EDIT: Please keep in mind that these URLs need to be found within larger strings of 'normal' text. For example, I want to match 'domain.net' in "Hello! Come check out domain.net!".

I think this could be accomplished with a regex that can determine whether the matching url contains .com, .net, .org, or .edu followed by either a forward slash or whitespace. Other than a user typo, I can't imagine any other case in which a valid URL would have one of those followed by anything else.

I realize there are many valid domain extensions out there, but I don't need to support them all. I can just choose which to support with something like (com|net|org|edu) in the regex. Unfortunately, I'm not skilled enough with regex yet to know how to properly implement this.

I'm hoping someone can help me find a regular expression (for use with PHP's preg_replace) that can match URLs based on just about any text connected by one or more dots and either ending with one of the specified extensions followed by whitespace OR containing one of the specified extensions followed by a slash and possibly folders.

I did several searches and so far have not found what I'm looking for. If there already exists a SO post that answers this, I apologize.

Thanks in advance.

--- EDIT 3 ---

After days of trial and error and some help from SO, here's what works:

preg_replace_callback('#(\s|^)((https?://)?(\w|-)+(\.(\w+|-)*)+(?<=\.net|org|edu|com|cc|br|jp|dk|gs|de)(\:[0-9]+)?(?:/[^\s]*)?)(?=\s|\b)#is',
                create_function('$m', 'if (!preg_match("#^(https?://)#", $m[2]))
                return $m[1]."<a href=\"http://".$m[2]."\">".$m[2]."</a>"; else return $m[1]."<a href=\"".$m[2]."\">".$m[2]."</a>";'),
                $event_desc);

This is a modified version of anubhava's code below and so far seems to do exactly what I want. Thanks!

3条回答
Juvenile、少年°
2楼-- · 2019-01-20 02:15

Thanks a ton. I modified his final solution to allow all domains (.ca, .co.uk), not just the specified ones.

$html = preg_replace_callback('#(\s|^)((https?://)?(\w|-)+(\.[a-z]{2,3})+(\:[0-9]+)?(?:/[^\s]*)?)(?=\s|\b)#is',
    create_function('$m', 'if (!preg_match("#^(https?://)#", $m[2])) return $m[1]."<a href=\"http://".$m[2]."\" target=\"blank\">".$m[2]."</a>"; else return $m[1]."<a href=\"".$m[2]."\" target=\"blank\">".$m[2]."</a>";'),
    $url);
查看更多
女痞
3楼-- · 2019-01-20 02:17
'/(http(s)?:\/\/)?[\w\/\.]+(\.((com)|(edu)|(net)|(org)))[\w\/]*/'

That works for your examples. You might want to add extra characters support for "-", "&", "?", ":", etc in the last bracket.

'/(http(s)?:\/\/)?[\w\/\.]+(\.((com)|(edu)|(net)|(org)))[\w\/\?=&-;]*/'

This will support parameters and port numbers.

eg.: www.foo.ca:8888/test?param1=val1&param2=val2

查看更多
冷血范
4楼-- · 2019-01-20 02:24

You can use this regex:

#(\s|^)((?:https?://)?\w+(?:\.\w+)+(?<=\.(net|org|edu|com))(?:/[^\s]*|))(?=\s|\b)#is

Code:

$arr = array(
'http://www.domain.com/?foo=bar',
'http://www.that"sallfolks.com',
'This is really cool site: https://www.domain.net/ isn\'t it?',
'http://subdomain.domain.org',
'www.domain.com/folder',
'Hello! You can visit vertigofx.com/mysite/rocks for some awesome pictures, or just go to vertigofx.com by itself',
'subdomain.domain.net',
'subdomain.domain.edu/folder/subfolder',
'Hello! Check out my site at domain.net!',
'welcome.to.computers',
'Hello.Come visit oursite.com!',
'foo.bar',
'domain.com/folder',

);
foreach($arr as $url) {   
   $link = preg_replace_callback('#(\s|^)((?:https?://)?\w+(?:\.\w+)+(?<=\.(net|org|edu|com))(?:/[^\s]*|))(?=\s|\b)#is',
           create_function('$m', 'if (!preg_match("#^(https?://)#", $m[2]))
               return $m[1]."<a href=\"http://".$m[2]."\">".$m[2]."</a>"; else return $m[1]."<a href=\"".$m[2]."\">".$m[2]."</a>";'),
           $url);
   echo $link . "\n";

OUTPUT:

<a href="http://www.domain.com/?foo=bar">http://www.domain.com/?foo=bar</a>
http://www.that"sallfolks.com
This is really cool site: <a href="https://www.domain.net">https://www.domain.net</a>/ isn't it?
<a href="http://subdomain.domain.org">http://subdomain.domain.org</a>
<a href="http://www.domain.com/folder">www.domain.com/folder</a>
Hello! You can visit <a href="http://vertigofx.com/mysite/rocks">vertigofx.com/mysite/rocks</a> for some awesome pictures, or just go to <a href="http://vertigofx.com">vertigofx.com</a> by itself
<a href="http://subdomain.domain.net">subdomain.domain.net</a>
<a href="http://subdomain.domain.edu/folder/subfolder">subdomain.domain.edu/folder/subfolder</a>
Hello! Check out my site at <a href="http://domain.net">domain.net</a>!
welcome.to.computers
Hello.Come visit <a href="http://oursite.com">oursite.com</a>!
foo.bar
<a href="http://domain.com/folder">domain.com/folder</a>

PS: This regex only supports http and https scheme in URL. So eg: if you want to support ftp also then you need to modify the regex a little.

查看更多
登录 后发表回答