How would one get the root DNS entry from $_SERVER['HTTP_HOST']
?
Input:
example.co.uk
www.example.com
blog.example.com
forum.example.co.uk
Output:
example.co.uk
example.com
example.com
example.co.uk
EDIT: Lookup list is very long
How would one get the root DNS entry from $_SERVER['HTTP_HOST']
?
Input:
example.co.uk
www.example.com
blog.example.com
forum.example.co.uk
Output:
example.co.uk
example.com
example.com
example.co.uk
EDIT: Lookup list is very long
For this project: http://drupal.org/project/parallel
Usage:
echo parallel_get_domain("www.robknight.org.uk") . "<br>";
echo parallel_get_domain("www.google.com") . "<br>";
echo parallel_get_domain("www.yahoo.com") . "<br>";
Functions:
/**
* Given host name returns top domain.
*
* @param $host
* String containing the host name: www.example.com
*
* @return string
* top domain: example.com
*/
function parallel_get_domain($host) {
if (strtoupper(substr(PHP_OS, 0, 3)) == 'WIN' && strnatcmp(phpversion(),'5.3.0') < 0) {
// This works 1/2 the time... CNAME doesn't work with nslookup
for ($end_pieces = substr_count($host, '.'); $end_pieces > 0; $end_pieces--) {
$test_domain = end(explode('.', $host, $end_pieces));
if (checkdnsrr($test_domain)) {
$domain = $test_domain;
break;
}
}
return isset($domain) ? $domain : FALSE;
}
else {
// This always works
$sections = explode('.', $host);
array_unshift($sections, '');
foreach($sections as $key => $value) {
$parts[$key] = $value;
$test_domain = implode('.', parallel_array_xor($parts, $sections));
if (checkdnsrr($test_domain, 'NS') && !checkdnsrr($test_domain, 'CNAME')) {
$domain = $test_domain;
break;
}
}
return isset($domain) ? $domain : FALSE;
}
}
/**
* Opposite of array_intersect().
*
* @param $array_a
* First array
* @param $array_b
* Second array
*
* @return array
*/
function parallel_array_xor ($array_a, $array_b) {
$union_array = array_merge($array_a, $array_b);
$intersect_array = array_intersect($array_a, $array_b);
return array_diff($union_array, $intersect_array);
}
/**
* Win compatible version of checkdnsrr.
*
* checkdnsrr() support for Windows by HM2K <php [spat] hm2k.org>
* http://us2.php.net/manual/en/function.checkdnsrr.php#88301
*
* @param $host
* String containing host name
* @param $type
* String containing the DNS record type
*
* @return bool
*/
function parallel_win_checkdnsrr($host, $type='MX') {
if (strtoupper(substr(PHP_OS, 0, 3)) != 'WIN') { return FALSE; }
if (empty($host)) { return FALSE; }
$types=array('A', 'MX', 'NS', 'SOA', 'PTR', 'CNAME', 'AAAA', 'A6', 'SRV', 'NAPTR', 'TXT', 'ANY');
if (!in_array($type, $types)) {
user_error("checkdnsrr() Type '$type' not supported", E_USER_WARNING);
return FALSE;
}
@exec('nslookup -type=' . $type . ' ' . escapeshellcmd($host), $output);
foreach($output as $line){
if (preg_match('/^' . $host . '/', $line)) { return TRUE; }
}
}
// Define checkdnsrr() if it doesn't exist
if (!function_exists('checkdnsrr')) {
function checkdnsrr($host, $type='MX') {
return parallel_win_checkdnsrr($host, $type);
}
}
Output - Windows:
org.uk
google.com
yahoo.com
Output - Linux:
robknight.org.uk
google.com
yahoo.com
I think that's a bit ill-defined.
You could try doing DNS lookups for each parent record until you find one that doesn't return an A record.
/[^\.]+\.[escaped|list|of|domains]$/
I think that should work.
As you've discovered, some countries use a TLD only (example: .tv, .us), others subdivide their country TLD (example: uk).
Ideally, you'll need a lookup list (it won't be long) of approved TLDs, and, if subdivided, the TLD with each subdivision (e.g., ".co.uk" instead of ".uk"). That will tell you which "dots" (from the right) to keep. Then move one dot to the left of that (if found) and chop everything before it.
Without a lookup list, you can exploit the fact that the subdivisions (.co, etc.) are only for countries (which have 2-letter TLDs) and are AFAIK never more than 3 characters themselves and are always letters, so you can probably recognize them with a regex pattern.
Edit: Nevermind, the actual list of public suffixes is much more complex. You're going to need to use a lookup table to figure out what the suffix is, go back to the previous dot, and trim left. RegEx is a poor solution here. Instead, store the list of suffixes in a Dictionary, then test against your domain name, lopping off one dotted portion at a time from the left until you hit a match, then add back the part you just trimmed off.
Note: as pointed out in the comments, this method doesn't actually work in all cases. The reason for this is that some top-level domains do resolve to IP addresses, even if most do not. Therefore it's not possible to detect if a given name is top-level or pseudo-top-level domain name merely by checking if it has an IP address. Unfortunately, this probably means that the only solution is a lookup list, given how inconsistently treated top-level domains are in practice.
I repeat: do not rely on the code below to work for you. I leave it here for educational purposes only.
There is a way to do this without a lookup list. The list may be unreliable or incomplete, whereas this method is guaranteed to work:
<?php
function get_domain($url) {
$dots = substr_count($url, '.');
$domain = '';
for ($end_pieces = $dots; $end_pieces > 0; $end_pieces--) {
$test_domain = end(explode('.', $url, $end_pieces));
if (dns_check_record($test_domain, 'A')) {
$domain = $test_domain;
break;
}
}
return $domain;
}
$my_domain = get_domain('www.robknight.org.uk');
echo $my_domain;
?>
In this case, it will output 'robknight.org.uk'. It would work equally well for .com, .edu, .com.au, .ly or whatever other top-level domain you're operating on.
It works by starting from the right and doing a DNS check on the first thing that looks like it might be a viable domain name. In the example above, it starts with 'org.uk', but discovers that this is not an actual domain name, but is a ccTLD. It then moves on to check 'robknight.org.uk', which is valid, and returns that. If the domain name had been, say, 'www.php.net', it would have started by checking 'php.net', which is a valid domain name, and would have returned that immediately without looping. I should also point out that if no valid domain name is found, an empty string ('') will be returned.
This code may be unsuitable for processing a large number of domain names in a short space of time due to the time taken for DNS lookups, but it's perfectly fine for single lookups or code that isn't time-critical.