Multiple preg_replace RegEx for different URLs

2019-09-02 15:49发布

问题:

I have a string like this:

Blablabla http://www.soundcloud.com/artist/track
www.facebook.com/page is my page
Try www.youtube.com/watch?v=1234567 for my video
Check http://www.somesite.com/bla.

I would like to replace URLs and insert different wordpress shortcodes inside a user generated post, exchange urls with videos or soundcloud widgets automatically and create regular links from all the other URLs and emails into something like this (simplified):

Blablabla [soundcloud]www.soundcloud.com/artist/track[/soundcloud]
[facebook]www.facebook.com/page[/facebook] is my page
Try [youtube]www.youtube.com/watch?v=1234567[/youtube] for my video
Check [url]www.somesite.com/bla[/url].

So I think I need to run several preg_replace actions on the string.

After I replaced Soundcloud, Facebook and Youtube URLs with the Wordpress shortcodes I need to run a preg_replace on the remaining URLs like http://www.somesite.com/bla but since the Facebook/Soundcloud/Youtube patterns are still available in the string (now inside the shortcodes) they will be replaced again into...

[youtube][url]www.youtube.com/watch?v=1234567[/url][/youtube]

I do not want this behaviour. I should be like this:

[url]www.youtube.com/watch?v=1234567[/url]

This is my basic RegEx:

((https?://)(www.)|(https?://)|(www.))[^ <]+

I need to replace URLs beginning with http, https and www

Has anyone a solution ?

greetz,

Mat

回答1:

I'd recommend you look into the preg_replace_callback function instead.

Rather than trying to match different subsets of urls, for each different site, just match them all! Then, in code check a specific capturing group to check the base of the url

So, in php code, if the url starts with facebook, replace the url with the facebook shortcode, and so on.

Here's your regex, slighly modified to capture the domain. Remember to escape your literal periods. This just captures up to the first < / ? or whitespace for the domain, then until the first < or whitespace for the rest of the URL. You might have to modify this if you find anything that this doesn't work for.

((https?://)(www\.)|(https?://)|(www\.))([^</\?\s]+)[^<\s]*

And now some php code. Recall that $matches[0] will have the full match, and $matches[6] will have the 6th caputuring group - in this case ([^</\?\s]+), the domain part

$post = preg_replace_callback(
    '/((https?:\/\/)(www\.)|(https?:\/\/)|(www\.))([^<\/\?\s]+)[^<\s]*/',
    function ($matches) {
        switch($matches[6]){
            case 'facebook.com':
                return "[facebook]" . $matches[0] . "[/facebook]";

            case 'youtube.com':
                return "[youtube]" . $matches[0] . "[/youtube]";

            case 'soundcloud.com':
                return "[soundcloud]" . $matches[0] . "[/soundcloud]";

            default:
                return "[url]" . $matches[0] . "[/url]";
        }
    },
    $post);