PHP validation/regex for URL

2018-12-31 04:30发布

I've been looking for a simple regex for URL's, does anybody have one handy that works well? I didn't find one with the zend framework validation classes and have seen several implementations.

Thanks

20条回答
不再属于我。
2楼-- · 2018-12-31 04:54

As per the PHP manual - parse_url should not be used to validate a URL.

Unfortunately, it seems that filter_var('example.com', FILTER_VALIDATE_URL) does not perform any better.

Both parse_url() and filter_var() will pass malformed URLs such as http://...

Therefore in this case - regex is the better method.

查看更多
几人难应
3楼-- · 2018-12-31 04:57

And there is your answer =) Try to break it, you can't!!!

function link_validate_url($text) {
$LINK_DOMAINS = 'aero|arpa|asia|biz|com|cat|coop|edu|gov|info|int|jobs|mil|museum|name|nato|net|org|pro|travel|mobi|local';
  $LINK_ICHARS_DOMAIN = (string) html_entity_decode(implode("", array( // @TODO completing letters ...
    "æ", // æ
    "Æ", // Æ
    "À", // À
    "à", // à
    "Á", // Á
    "á", // á
    "Â", // Â
    "â", // â
    "å", // å
    "Å", // Å
    "ä", // ä
    "Ä", // Ä
    "Ç", // Ç
    "ç", // ç
    "Ð", // Ð
    "ð", // ð
    "È", // È
    "è", // è
    "É", // É
    "é", // é
    "Ê", // Ê
    "ê", // ê
    "Ë", // Ë
    "ë", // ë
    "Î", // Î
    "î", // î
    "Ï", // Ï
    "ï", // ï
    "ø", // ø
    "Ø", // Ø
    "ö", // ö
    "Ö", // Ö
    "Ô", // Ô
    "ô", // ô
    "Õ", // Õ
    "õ", // õ
    "Œ", // Œ
    "œ", // œ
    "ü", // ü
    "Ü", // Ü
    "Ù", // Ù
    "ù", // ù
    "Û", // Û
    "û", // û
    "Ÿ", // Ÿ
    "ÿ", // ÿ 
    "Ñ", // Ñ
    "ñ", // ñ
    "þ", // þ
    "Þ", // Þ
    "ý", // ý
    "Ý", // Ý
    "¿", // ¿
  )), ENT_QUOTES, 'UTF-8');

  $LINK_ICHARS = $LINK_ICHARS_DOMAIN . (string) html_entity_decode(implode("", array(
    "ß", // ß
  )), ENT_QUOTES, 'UTF-8');
  $allowed_protocols = array('http', 'https', 'ftp', 'news', 'nntp', 'telnet', 'mailto', 'irc', 'ssh', 'sftp', 'webcal');

  // Starting a parenthesis group with (?: means that it is grouped, but is not captured
  $protocol = '((?:'. implode("|", $allowed_protocols) .'):\/\/)';
  $authentication = "(?:(?:(?:[\w\.\-\+!$&'\(\)*\+,;=" . $LINK_ICHARS . "]|%[0-9a-f]{2})+(?::(?:[\w". $LINK_ICHARS ."\.\-\+%!$&'\(\)*\+,;=]|%[0-9a-f]{2})*)?)?@)";
  $domain = '(?:(?:[a-z0-9' . $LINK_ICHARS_DOMAIN . ']([a-z0-9'. $LINK_ICHARS_DOMAIN . '\-_\[\]])*)(\.(([a-z0-9' . $LINK_ICHARS_DOMAIN . '\-_\[\]])+\.)*('. $LINK_DOMAINS .'|[a-z]{2}))?)';
  $ipv4 = '(?:[0-9]{1,3}(\.[0-9]{1,3}){3})';
  $ipv6 = '(?:[0-9a-fA-F]{1,4}(\:[0-9a-fA-F]{1,4}){7})';
  $port = '(?::([0-9]{1,5}))';

  // Pattern specific to external links.
  $external_pattern = '/^'. $protocol .'?'. $authentication .'?('. $domain .'|'. $ipv4 .'|'. $ipv6 .' |localhost)'. $port .'?';

  // Pattern specific to internal links.
  $internal_pattern = "/^(?:[a-z0-9". $LINK_ICHARS ."_\-+\[\]]+)";
  $internal_pattern_file = "/^(?:[a-z0-9". $LINK_ICHARS ."_\-+\[\]\.]+)$/i";

  $directories = "(?:\/[a-z0-9". $LINK_ICHARS ."_\-\.~+%=&,$'#!():;*@\[\]]*)*";
  // Yes, four backslashes == a single backslash.
  $query = "(?:\/?\?([?a-z0-9". $LINK_ICHARS ."+_|\-\.~\/\\\\%=&,$'():;*@\[\]{} ]*))";
  $anchor = "(?:#[a-z0-9". $LINK_ICHARS ."_\-\.~+%=&,$'():;*@\[\]\/\?]*)";

  // The rest of the path for a standard URL.
  $end = $directories .'?'. $query .'?'. $anchor .'?'.'$/i';

  $message_id = '[^@].*@'. $domain;
  $newsgroup_name = '(?:[0-9a-z+-]*\.)*[0-9a-z+-]*';
  $news_pattern = '/^news:('. $newsgroup_name .'|'. $message_id .')$/i';

  $user = '[a-zA-Z0-9'. $LINK_ICHARS .'_\-\.\+\^!#\$%&*+\/\=\?\`\|\{\}~\'\[\]]+';
  $email_pattern = '/^mailto:'. $user .'@'.'(?:'. $domain .'|'. $ipv4 .'|'. $ipv6 .'|localhost)'. $query .'?$/';

  if (strpos($text, '<front>') === 0) {
    return false;
  }
  if (in_array('mailto', $allowed_protocols) && preg_match($email_pattern, $text)) {
    return false;
  }
  if (in_array('news', $allowed_protocols) && preg_match($news_pattern, $text)) {
    return false;
  }
  if (preg_match($internal_pattern . $end, $text)) {
    return false;
  }
  if (preg_match($external_pattern . $end, $text)) {
    return false;
  }
  if (preg_match($internal_pattern_file, $text)) {
    return false;
  }

  return true;
}
查看更多
泪湿衣
4楼-- · 2018-12-31 04:57

For anyone developing with WordPress, just use

esc_url_raw($url) === $url

to validate a URL (here's WordPress' documentation on esc_url_raw). It handles URLs much better than filter_var($url, FILTER_VALIDATE_URL) because it is unicode and XSS-safe. (Here is a good article mentioning all the problems with filter_var).

查看更多
泪湿衣
5楼-- · 2018-12-31 04:59
    function validateURL($URL) {
      $pattern_1 = "/^(http|https|ftp):\/\/(([A-Z0-9][A-Z0-9_-]*)(\.[A-Z0-9][A-Z0-9_-]*)+.(com|org|net|dk|at|us|tv|info|uk|co.uk|biz|se)$)(:(\d+))?\/?/i";
      $pattern_2 = "/^(www)((\.[A-Z0-9][A-Z0-9_-]*)+.(com|org|net|dk|at|us|tv|info|uk|co.uk|biz|se)$)(:(\d+))?\/?/i";       
      if(preg_match($pattern_1, $URL) || preg_match($pattern_2, $URL)){
        return true;
      } else{
        return false;
      }
    }
查看更多
无与为乐者.
6楼-- · 2018-12-31 05:00

I've found this to be the most useful for matching a URL..

^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$
查看更多
刘海飞了
7楼-- · 2018-12-31 05:02

Inspired in this .NET StackOverflow question and in this referenced article from that question there is this URI validator (URI means it validates both URL and URN).

if( ! preg_match( "/^([a-z][a-z0-9+.-]*):(?:\\/\\/((?:(?=((?:[a-z0-9-._~!$&'()*+,;=:]|%[0-9A-F]{2})*))(\\3)@)?(?=(\\[[0-9A-F:.]{2,}\\]|(?:[a-z0-9-._~!$&'()*+,;=]|%[0-9A-F]{2})*))\\5(?::(?=(\\d*))\\6)?)(\\/(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/]|%[0-9A-F]{2})*))\\8)?|(\\/?(?!\\/)(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/]|%[0-9A-F]{2})*))\\10)?)(?:\\?(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/?]|%[0-9A-F]{2})*))\\11)?(?:#(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/?]|%[0-9A-F]{2})*))\\12)?$/i", $uri ) )
{
    throw new \RuntimeException( "URI has not a valid format." );
}

I have successfully unit-tested this function inside a ValueObject I made named Uri and tested by UriTest.

UriTest.php (Contains valid and invalid cases for both URLs and URNs)

<?php

declare( strict_types = 1 );

namespace XaviMontero\ThrasherPortage\Tests\Tour;

use XaviMontero\ThrasherPortage\Tour\Uri;

class UriTest extends \PHPUnit_Framework_TestCase
{
    private $sut;

    public function testCreationIsOfProperClassWhenUriIsValid()
    {
        $sut = new Uri( 'http://example.com' );
        $this->assertInstanceOf( 'XaviMontero\\ThrasherPortage\\Tour\\Uri', $sut );
    }

    /**
     * @dataProvider urlIsValidProvider
     * @dataProvider urnIsValidProvider
     */
    public function testGetUriAsStringWhenUriIsValid( string $uri )
    {
        $sut = new Uri( $uri );
        $actual = $sut->getUriAsString();

        $this->assertInternalType( 'string', $actual );
        $this->assertEquals( $uri, $actual );
    }

    public function urlIsValidProvider()
    {
        return
            [
                [ 'http://example-server' ],
                [ 'http://example.com' ],
                [ 'http://example.com/' ],
                [ 'http://subdomain.example.com/path/?parameter1=value1&parameter2=value2' ],
                [ 'random-protocol://example.com' ],
                [ 'http://example.com:80' ],
                [ 'http://example.com?no-path-separator' ],
                [ 'http://example.com/pa%20th/' ],
                [ 'ftp://example.org/resource.txt' ],
                [ 'file://../../../relative/path/needs/protocol/resource.txt' ],
                [ 'http://example.com/#one-fragment' ],
                [ 'http://example.edu:8080#one-fragment' ],
            ];
    }

    public function urnIsValidProvider()
    {
        return
            [
                [ 'urn:isbn:0-486-27557-4' ],
                [ 'urn:example:mammal:monotreme:echidna' ],
                [ 'urn:mpeg:mpeg7:schema:2001' ],
                [ 'urn:uuid:6e8bc430-9c3a-11d9-9669-0800200c9a66' ],
                [ 'rare-urn:uuid:6e8bc430-9c3a-11d9-9669-0800200c9a66' ],
                [ 'urn:FOO:a123,456' ]
            ];
    }

    /**
     * @dataProvider urlIsNotValidProvider
     * @dataProvider urnIsNotValidProvider
     */
    public function testCreationThrowsExceptionWhenUriIsNotValid( string $uri )
    {
        $this->expectException( 'RuntimeException' );
        $this->sut = new Uri( $uri );
    }

    public function urlIsNotValidProvider()
    {
        return
            [
                [ 'only-text' ],
                [ 'http//missing.colon.example.com/path/?parameter1=value1&parameter2=value2' ],
                [ 'missing.protocol.example.com/path/' ],
                [ 'http://example.com\\bad-separator' ],
                [ 'http://example.com|bad-separator' ],
                [ 'ht tp://example.com' ],
                [ 'http://exampl e.com' ],
                [ 'http://example.com/pa th/' ],
                [ '../../../relative/path/needs/protocol/resource.txt' ],
                [ 'http://example.com/#two-fragments#not-allowed' ],
                [ 'http://example.edu:portMustBeANumber#one-fragment' ],
            ];
    }

    public function urnIsNotValidProvider()
    {
        return
            [
                [ 'urn:mpeg:mpeg7:sch ema:2001' ],
                [ 'urn|mpeg:mpeg7:schema:2001' ],
                [ 'urn?mpeg:mpeg7:schema:2001' ],
                [ 'urn%mpeg:mpeg7:schema:2001' ],
                [ 'urn#mpeg:mpeg7:schema:2001' ],
            ];
    }
}

Uri.php (Value Object)

<?php

declare( strict_types = 1 );

namespace XaviMontero\ThrasherPortage\Tour;

class Uri
{
    /** @var string */
    private $uri;

    public function __construct( string $uri )
    {
        $this->assertUriIsCorrect( $uri );
        $this->uri = $uri;
    }

    public function getUriAsString()
    {
        return $this->uri;
    }

    private function assertUriIsCorrect( string $uri )
    {
        // https://stackoverflow.com/questions/30847/regex-to-validate-uris
        // http://snipplr.com/view/6889/regular-expressions-for-uri-validationparsing/

        if( ! preg_match( "/^([a-z][a-z0-9+.-]*):(?:\\/\\/((?:(?=((?:[a-z0-9-._~!$&'()*+,;=:]|%[0-9A-F]{2})*))(\\3)@)?(?=(\\[[0-9A-F:.]{2,}\\]|(?:[a-z0-9-._~!$&'()*+,;=]|%[0-9A-F]{2})*))\\5(?::(?=(\\d*))\\6)?)(\\/(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/]|%[0-9A-F]{2})*))\\8)?|(\\/?(?!\\/)(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/]|%[0-9A-F]{2})*))\\10)?)(?:\\?(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/?]|%[0-9A-F]{2})*))\\11)?(?:#(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/?]|%[0-9A-F]{2})*))\\12)?$/i", $uri ) )
        {
            throw new \RuntimeException( "URI has not a valid format." );
        }
    }
}

Running UnitTests

There are 65 assertions in 46 tests. Caution: there are 2 data-providers for valid and 2 more for invalid expressions. One is for URLs and the other for URNs. If you are using a version of PhpUnit of v5.6* or earlier then you need to join the two data providers into a single one.

xavi@bromo:~/custom_www/hello-trip/mutant-migrant$ vendor/bin/phpunit
PHPUnit 5.7.3 by Sebastian Bergmann and contributors.

..............................................                    46 / 46 (100%)

Time: 82 ms, Memory: 4.00MB

OK (46 tests, 65 assertions)

Code coverage

There's is 100% of code-coverage in this sample URI checker.

查看更多
登录 后发表回答