I am attempting to create a php function which will check if the passes URL is a short URL. Something like this:
/**
* Check if a URL is a short URL
*
* @param string $url
* return bool
*/
function _is_short_url($url){
// Code goes here
}
I know that a simpler and a sure shot way would be to check a 301 redirect, but this function aims at saving an external request just for checking. Neither should the function check against a list of URL shortners as that would be a less scale-able approach.
So are a few possible checks I was thinking:
- Overall URL length - May be a max of 30 charecters
- URL length after last '/' - May be a max of 10 characters
- Number of '/' after protocol (http://) - Max 2
- Max length of host
Any thoughts on a possible approach or a more exhaustive checklist for this?
EDIT: This function is just an attempt to save an external request, so its ok to return true for a non-short url (but a real short one). Post passing through this function, I would anyways expand all short URLs by checking 301 redirects. This is just to eliminate the obvious ones.
I would not recommend to use regex, as it will be too complex and difficult to understand. Here is a PHP code to check all your constraints:
function _is_short_url($url){
// 1. Overall URL length - May be a max of 30 charecters
if (strlen($url) > 30) return false;
$parts = parse_url($url);
// No query string & no fragment
if ($parts["query"] || $parts["fragment"]) return false;
$path = $parts["path"];
$pathParts = explode("/", $path);
// 3. Number of '/' after protocol (http://) - Max 2
if (count($pathParts) > 2) return false;
// 2. URL length after last '/' - May be a max of 10 characters
$lastPath = array_pop($pathParts);
if (strlen($lastPath) > 10) return false;
// 4. Max length of host
if (strlen($parts["host"]) > 10) return false;
return true;
}
Here is a small function which checks for all your requirements. I was able to check it without using a complex regex,... only preg_split. You should adapt it yourself easily.
<?php
var_dump(_isShortUrl('http://bit.ly/foo'));
function _isShortUrl($url)
{
// Check for max URL length (30)
if (strlen($url) > 30) {
return false;
}
// Check, if there are more than two URL parts/slashes (5 splitted values)
$parts = preg_split('/\//', $url);
if (count($parts) > 5) {
return false;
}
// Check for max host length (10)
$host = $parts[2];
if (strlen($host) > 10) {
return false;
}
// Check for max length of last URL part (after last slash)
$lastPart = array_pop($parts);
if (strlen($lastPart) > 10) {
return false;
}
return true;
}
If I was you I would test if the url shows a 301 redirect, and then test if the redirect redirects to another website:
function _is_short_url($url) {
$options['http']['method'] = 'HEAD';
stream_context_set_default($options); # don't fetch the full page
$headers = get_headers($url,1);
if ( isset($headers[0]) ) {
if (strpos($headers[0],'301')!==false && isset($headers['Location'])) {
$location = $headers['Location'];
$url = parse_url($url);
$location = parse_url($location);
if ($url['host'] != $location['host'])
return true;
}
}
return false;
}
echo (int)_is_short_url('http://bit.ly/1GoNYa');
Why not check if the host matches a known URL shortener. You cold get a list of most common url shorteners for example here.