Convert urls from text to links even if no protoco

2020-03-02 18:03发布

问题:

Lets say that $content is the content of a textarea

/*Convert the http/https to link */
     $content = preg_replace('!((https://|http://)+[a-z0-9_./?=&-]+)!i', '<a target="_blank" href="$1">$1</a> ', nl2br($_POST['helpcontent'])." ");
/*Convert the www. to link prepending http://*/
     $content = preg_replace('!((www\.)+[a-z0-9_./?=&-]+)!i', '<a target="_blank" href="http://$1">$1</a> ', $content." ");

This was working ok for links, but realised that it was breaking the markup when an image is within the text...

I am trying like this now:

$content = preg_replace('!\s((https?://|http://)+[a-z0-9_./?=&-]+)!i', ' <a href="$1">$1</a> ', nl2br($_POST['content'])." ");
$content = preg_replace('!((www\.)+[a-z0-9_./?=&-]+)!i', '<a target="_blank" href="http://$1">$1</a> ', $content." ");

As is the images are respected, but the problem is that url's with http:// or https:// format won't be converted now..:

google.com -> Not converted (as expected)

www.google.com -> Well Converted

http://google.com -> Not converted (unexpected)

https://google.com -> Not converted (unexpected)

What am I missing?

-EDIT-

Current almost working solution:

$content = preg_replace('!(\s|^)((https?://)+[a-z0-9_./?=&-]+)!i', ' <a href="$2" target="_blank">$2</a> ', nl2br($_POST['content'])." ");
$content = preg_replace('!(\s|^)((www\.)+[a-z0-9_./?=&-]+)!i', '<a target="_blank" href="http://$2"  target="_blank">$2</a> ', $content." ");

The thing here is that if this is the input:

www.funcook.com http://www.funcook.com https://www.funcook.com funcook.com http://funcook.com https://funcook.com

All the urls I want (all, except name.domain) are converted as expected, but this is the output

www.funcook.com http://www.funcook.com https://www.funcook.com ; funcook.com http://funcook.com https://funcook.com

Note an ; is inserted, any idea why?

回答1:

try this:

preg_replace('!(\s|^)((https?://|www\.)+[a-z0-9_./?=&-]+)!i', ' <a href="$2">$2</a> ',$text);

It will pick up links beginning with http:// or with www.

Example



回答2:

You can't at 100%. Becuase there may be links such as stackoverflow.com which do not have www..

If you're only targeting those links:

!(www\.\S+)!i

Should work well enough for you.


EDIT: As for your newest question, as to why http links don't get converted but https do, Your first pattern only searches for https://, or http://. which isn't the case. Simplify it by replacing:

(https://|http://\.)

With

(https?://)

Which will make the s optional.



回答3:

Another method to go about adding hyperlinks is that you could take the text that you want to parse for links, and explode it into an array. Then loop through it using foreach (very fast function - http://www.phpbench.com/) and change anything that starts with http://, or https://, or www., or ends with .com/.org/etc into a link.

I'm thinking maybe something like this:

$userTextArray = explode(" ",$userText);
foreach( $userTextArray as &$word){
    //if statements to test if if it starts with www. or ends with .com or whatever else
    //change $word so that it is a link
}

Your changes will be reflected in the array since you had the "&" before $userText in your foreach statement. Now just implode the array back into a string and you're good to go.

This made sense in my head... But I'm not 100% sure that this is what you're looking for



回答4:

I had similar problem. Here is function which helped me. Maybe it will fit your needs to:

function clHost($Address) { 
   $parseUrl = parse_url(trim($Address)); 
   return  str_replace ("www.","",trim(trim($parseUrl[host] ? $parseUrl[host].$parseUrl[path] :  $parseUrl[path]),'/'));
}

This function will return domain without protocol and "www", so you can add them yourself later.

For example:

$url = "http://www.". clHost($link);

I did it like that, because I couldn't find good regexp.



回答5:

\s((https?://|www.)+[a-z0-9_./?=&-]+)

The problem is that your starting \s is forcing the match to start with a space, so, if you don't have that starting space your match fails. The reg exp is fine (without the \s), but to avoid replacing the images you need to add something to avoid matching them.

If the images are pure html use this: (?<!src=")((https?://|www.)+[a-z0-9_./?=&-]+)

That will look for src=" before the url, to ignore it.

If you use another mark up, tell me and I'll try to find another way to avoid the images.