I need to find whether a given url is valid or not, the scenario is it should be allowed if it contains urls haviing
1.Generic top-level domains 2.Country code top-level domains refer below url http://en.wikipedia.org/wiki/List_of_Internet_top-level_domains
I need to do this in PHP, this is currently what I am doing
$regexUrl = "((https?|ftp)\:\/\/)?"; // SCHEME $regexUrl .= "([a-zA-Z0-9+!*(),;?&=\$_.-]+(\:[a-zA-Z0-9+!*(),;?&=\$_.-]+)?@)?"; // User and Pass $regexUrl .= "([a-zA-Z0-9-]+)\.([a-zA-Z]{2,3})"; // Host or IP $regexUrl .= "(\:[0-9]{2,5})?"; // Port $regexUrl .= "(\/([a-zA-Z0-9+\$_-]\.?)+)*\/?"; // Path $regexUrl .= "(\?[a-zA-Z+&\$_.-][a-zA-Z0-9;:@&%=+\/\$_.-]*)?"; // GET Query $regexUrl .= "(#[a-zA-Z_.-][a-zA-Z0-9+\$_.-]*)?"; // Anchor //if(preg_match_all("#\bhttps?://[^\s()]+(?:\([\w\d]+\)|([^[:punct:]\s]|/))#", $message, $matches1, PREG_PATTERN_ORDER)) //$pattern = '/((https?|ftp)\:(\/\/)|(file\:\/{2,3}))?(((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))|(((([a-zA-Z0-9]+)(\.)?)+)(\.)(com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum|[a-z]{2}))([\/][\/a-zA-Z0-9\.]*)*([\/]?(([\?][a-zA-Z0-9]+[\=][a-zA-Z0-9\%\(\)]*)([\&][a-zA-Z0-9]+[\=][a-zA-Z0-9\%\(\)]*)*))?/'; if(preg_match_all("/$regexUrl/", $urlMessage, $matches1, PREG_PATTERN_ORDER)) { try { foreach($matches1[0] as $urlToTrim1) { $url= $urlToTrim1; echo $url; } } catch(Exception $e) { $url="-1"; } }
To figure out if it's generally a valid URL:
http://www.php.net/manual/en/function.filter-var.php
If you want to confirm that the TLD is in your approved list (I don't know if
filter_var
goes so far as to check whether a TLD actually exists):Or simply try to look up the DNS record of the domain using
gethostbyname
. If one exists, it's a valid domain.** Unless you're being DNS spoofed, if that case is important to you...