PHP validation/regex for URL

I've been looking for a simple regex for URL's, does anybody have one handy that works well? I didn't find one with the zend framework validation classes and have seen several implementations.


As per the PHP manual - parse_url should not be used to validate a URL.

Unfortunately, it seems that filter_var('', FILTER_VALIDATE_URL) does not perform any better.

Both parse_url() and filter_var() will pass malformed URLs such as http://...

Therefore in this case - regex is the better method.

And there is your answer =) Try to break it, you can't!!!

function link_validate_url($text) {
$LINK_DOMAINS = 'aero|arpa|asia|biz|com|cat|coop|edu|gov|info|int|jobs|mil|museum|name|nato|net|org|pro|travel|mobi|local';
  $LINK_ICHARS_DOMAIN = (string) html_entity_decode(implode("", array( // @TODO completing letters ...
    "æ", // æ
    "Æ", // Æ
    "À", // À
    "à", // à
    "Á", // Á
    "á", // á
    "Â", // Â
    "â", // â
    "å", // å
    "Å", // Å
    "ä", // ä
    "Ä", // Ä
    "Ç", // Ç
    "ç", // ç
    "Ð", // Ð
    "ð", // ð
    "È", // È
    "è", // è
    "É", // É
    "é", // é
    "Ê", // Ê
    "ê", // ê
    "Ë", // Ë
    "ë", // ë
    "Î", // Î
    "î", // î
    "Ï", // Ï
    "ï", // ï
    "ø", // ø
    "Ø", // Ø
    "ö", // ö
    "Ö", // Ö
    "Ô", // Ô
    "ô", // ô
    "Õ", // Õ
    "õ", // õ
    "Œ", // Œ
    "œ", // œ
    "ü", // ü
    "Ü", // Ü
    "Ù", // Ù
    "ù", // ù
    "Û", // Û
    "û", // û
    "Ÿ", // Ÿ
    "ÿ", // ÿ 
    "Ñ", // Ñ
    "ñ", // ñ
    "þ", // þ
    "Þ", // Þ
    "ý", // ý
    "Ý", // Ý
    "¿", // ¿
  )), ENT_QUOTES, 'UTF-8');

  $LINK_ICHARS = $LINK_ICHARS_DOMAIN . (string) html_entity_decode(implode("", array(
    "ß", // ß
  )), ENT_QUOTES, 'UTF-8');
  $allowed_protocols = array('http', 'https', 'ftp', 'news', 'nntp', 'telnet', 'mailto', 'irc', 'ssh', 'sftp', 'webcal');

  // Starting a parenthesis group with (?: means that it is grouped, but is not captured
  $protocol = '((?:'. implode("|", $allowed_protocols) .'):\/\/)';
  $authentication = "(?:(?:(?:[\w\.\-\+!$&'\(\)*\+,;=" . $LINK_ICHARS . "]|%[0-9a-f]{2})+(?::(?:[\w". $LINK_ICHARS ."\.\-\+%!$&'\(\)*\+,;=]|%[0-9a-f]{2})*)?)?@)";
  $domain = '(?:(?:[a-z0-9' . $LINK_ICHARS_DOMAIN . ']([a-z0-9'. $LINK_ICHARS_DOMAIN . '\-_\[\]])*)(\.(([a-z0-9' . $LINK_ICHARS_DOMAIN . '\-_\[\]])+\.)*('. $LINK_DOMAINS .'|[a-z]{2}))?)';
  $ipv4 = '(?:[0-9]{1,3}(\.[0-9]{1,3}){3})';
  $ipv6 = '(?:[0-9a-fA-F]{1,4}(\:[0-9a-fA-F]{1,4}){7})';
  $port = '(?::([0-9]{1,5}))';

  // Pattern specific to external links.
  $external_pattern = '/^'. $protocol .'?'. $authentication .'?('. $domain .'|'. $ipv4 .'|'. $ipv6 .' |localhost)'. $port .'?';

  // Pattern specific to internal links.
  $internal_pattern = "/^(?:[a-z0-9". $LINK_ICHARS ."_\-+\[\]]+)";
  $internal_pattern_file = "/^(?:[a-z0-9". $LINK_ICHARS ."_\-+\[\]\.]+)$/i";

  $directories = "(?:\/[a-z0-9". $LINK_ICHARS ."_\-\.~+%=&,$'#!():;*@\[\]]*)*";
  // Yes, four backslashes == a single backslash.
  $query = "(?:\/?\?([?a-z0-9". $LINK_ICHARS ."+_|\-\.~\/\\\\%=&,$'():;*@\[\]{} ]*))";
  $anchor = "(?:#[a-z0-9". $LINK_ICHARS ."_\-\.~+%=&,$'():;*@\[\]\/\?]*)";

  // The rest of the path for a standard URL.
  $end = $directories .'?'. $query .'?'. $anchor .'?'.'$/i';

  $message_id = '[^@].*@'. $domain;
  $newsgroup_name = '(?:[0-9a-z+-]*\.)*[0-9a-z+-]*';
  $news_pattern = '/^news:('. $newsgroup_name .'|'. $message_id .')$/i';

  $user = '[a-zA-Z0-9'. $LINK_ICHARS .'_\-\.\+\^!#\$%&*+\/\=\?\`\|\{\}~\'\[\]]+';
  $email_pattern = '/^mailto:'. $user .'@'.'(?:'. $domain .'|'. $ipv4 .'|'. $ipv6 .'|localhost)'. $query .'?$/';

  if (strpos($text, '<front>') === 0) {
    return false;
  if (in_array('mailto', $allowed_protocols) && preg_match($email_pattern, $text)) {
    return false;
  if (in_array('news', $allowed_protocols) && preg_match($news_pattern, $text)) {
    return false;
  if (preg_match($internal_pattern . $end, $text)) {
    return false;
  if (preg_match($external_pattern . $end, $text)) {
    return false;
  if (preg_match($internal_pattern_file, $text)) {
    return false;

  return true;
For anyone developing with WordPress, just use

esc_url_raw($url) === $url

to validate a URL (here's WordPress' documentation on esc_url_raw). It handles URLs much better than filter_var($url, FILTER_VALIDATE_URL) because it is unicode and XSS-safe. (Here is a good article mentioning all the problems with filter_var).

    function validateURL($URL) {
      $pattern_1 = "/^(http|https|ftp):\/\/(([A-Z0-9][A-Z0-9_-]*)(\.[A-Z0-9][A-Z0-9_-]*)+.(com|org|net|dk|at|us|tv|info|uk||biz|se)$)(:(\d+))?\/?/i";
      $pattern_2 = "/^(www)((\.[A-Z0-9][A-Z0-9_-]*)+.(com|org|net|dk|at|us|tv|info|uk||biz|se)$)(:(\d+))?\/?/i";       
      if(preg_match($pattern_1, $URL) || preg_match($pattern_2, $URL)){
        return true;
      } else{
        return false;
I've found this to be the most useful for matching a URL..

^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$
Inspired in this .NET StackOverflow question and in this referenced article from that question there is this URI validator (URI means it validates both URL and URN).

if( ! preg_match( "/^([a-z][a-z0-9+.-]*):(?:\\/\\/((?:(?=((?:[a-z0-9-._~!$&'()*+,;=:]|%[0-9A-F]{2})*))(\\3)@)?(?=(\\[[0-9A-F:.]{2,}\\]|(?:[a-z0-9-._~!$&'()*+,;=]|%[0-9A-F]{2})*))\\5(?::(?=(\\d*))\\6)?)(\\/(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/]|%[0-9A-F]{2})*))\\8)?|(\\/?(?!\\/)(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/]|%[0-9A-F]{2})*))\\10)?)(?:\\?(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/?]|%[0-9A-F]{2})*))\\11)?(?:#(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/?]|%[0-9A-F]{2})*))\\12)?$/i", $uri ) )
    throw new \RuntimeException( "URI has not a valid format." );

I have successfully unit-tested this function inside a ValueObject I made named Uri and tested by UriTest.

UriTest.php (Contains valid and invalid cases for both URLs and URNs)


declare( strict_types = 1 );

namespace XaviMontero\ThrasherPortage\Tests\Tour;

use XaviMontero\ThrasherPortage\Tour\Uri;

class UriTest extends \PHPUnit_Framework_TestCase
    private $sut;

    public function testCreationIsOfProperClassWhenUriIsValid()
        $sut = new Uri( '' );
        $this->assertInstanceOf( 'XaviMontero\\ThrasherPortage\\Tour\\Uri', $sut );

     * @dataProvider urlIsValidProvider
     * @dataProvider urnIsValidProvider
    public function testGetUriAsStringWhenUriIsValid( string $uri )
        $sut = new Uri( $uri );
        $actual = $sut->getUriAsString();

        $this->assertInternalType( 'string', $actual );
        $this->assertEquals( $uri, $actual );

    public function urlIsValidProvider()
                [ 'http://example-server' ],
                [ '' ],
                [ '' ],
                [ '' ],
                [ 'random-protocol://' ],
                [ '' ],
                [ '' ],
                [ '' ],
                [ '' ],
                [ 'file://../../../relative/path/needs/protocol/resource.txt' ],
                [ '' ],
                [ '' ],

    public function urnIsValidProvider()
                [ 'urn:isbn:0-486-27557-4' ],
                [ 'urn:example:mammal:monotreme:echidna' ],
                [ 'urn:mpeg:mpeg7:schema:2001' ],
                [ 'urn:uuid:6e8bc430-9c3a-11d9-9669-0800200c9a66' ],
                [ 'rare-urn:uuid:6e8bc430-9c3a-11d9-9669-0800200c9a66' ],
                [ 'urn:FOO:a123,456' ]

     * @dataProvider urlIsNotValidProvider
     * @dataProvider urnIsNotValidProvider
    public function testCreationThrowsExceptionWhenUriIsNotValid( string $uri )
        $this->expectException( 'RuntimeException' );
        $this->sut = new Uri( $uri );

    public function urlIsNotValidProvider()
                [ 'only-text' ],
                [ 'http//' ],
                [ '' ],
                [ '\\bad-separator' ],
                [ '|bad-separator' ],
                [ 'ht tp://' ],
                [ 'http://exampl' ],
                [ ' th/' ],
                [ '../../../relative/path/needs/protocol/resource.txt' ],
                [ '' ],
                [ '' ],

    public function urnIsNotValidProvider()
                [ 'urn:mpeg:mpeg7:sch ema:2001' ],
                [ 'urn|mpeg:mpeg7:schema:2001' ],
                [ 'urn?mpeg:mpeg7:schema:2001' ],
                [ 'urn%mpeg:mpeg7:schema:2001' ],
                [ 'urn#mpeg:mpeg7:schema:2001' ],

Uri.php (Value Object)


declare( strict_types = 1 );

namespace XaviMontero\ThrasherPortage\Tour;

class Uri
    /** @var string */
    private $uri;

    public function __construct( string $uri )
        $this->assertUriIsCorrect( $uri );
        $this->uri = $uri;

    public function getUriAsString()
        return $this->uri;

    private function assertUriIsCorrect( string $uri )

        if( ! preg_match( "/^([a-z][a-z0-9+.-]*):(?:\\/\\/((?:(?=((?:[a-z0-9-._~!$&'()*+,;=:]|%[0-9A-F]{2})*))(\\3)@)?(?=(\\[[0-9A-F:.]{2,}\\]|(?:[a-z0-9-._~!$&'()*+,;=]|%[0-9A-F]{2})*))\\5(?::(?=(\\d*))\\6)?)(\\/(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/]|%[0-9A-F]{2})*))\\8)?|(\\/?(?!\\/)(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/]|%[0-9A-F]{2})*))\\10)?)(?:\\?(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/?]|%[0-9A-F]{2})*))\\11)?(?:#(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/?]|%[0-9A-F]{2})*))\\12)?$/i", $uri ) )
            throw new \RuntimeException( "URI has not a valid format." );

Running UnitTests

There are 65 assertions in 46 tests. Caution: there are 2 data-providers for valid and 2 more for invalid expressions. One is for URLs and the other for URNs. If you are using a version of PhpUnit of v5.6* or earlier then you need to join the two data providers into a single one.

xavi@bromo:~/custom_www/hello-trip/mutant-migrant$ vendor/bin/phpunit
PHPUnit 5.7.3 by Sebastian Bergmann and contributors.

..............................................                    46 / 46 (100%)

Time: 82 ms, Memory: 4.00MB

OK (46 tests, 65 assertions)

Code coverage

There's is 100% of code-coverage in this sample URI checker.

