Is there a way to get top level domain name from the url
for e.g., "https://images.google.com/blah" => "google"
I found this:
var domain = new URL(pageUrl).hostname;
but it gives me "images.google.com" instead of just google.
Unit tests I have are:
https://images.google.com => google
https://www.google.com/blah => google
https://www.google.co.uk/blah => google
https://www.images.google.com/blah => google
You could do this:
location.hostname.split('.').pop()
EDIT
Saw the change to your question, you would need a list of all TLDs to match against and remove from the hostname, then you could use split('.').pop()
// small example list
var re = new RegExp('\.+(co.uk|me|com|us)')
var secondLevelDomain = 'https://www.google.co.uk'.replace(re, '').split('.').pop()
This is the simplest solution besides maintaining white & black top level domain lists.
Match on top level domain if it has two or more characters 'xxxx.yyy'
Match on top level domain and sub-domain, if both are under two characters 'xxxxx.yy.zz'
Remove Match.
Return everything between the last period, and the end of the string.
I broke it into two separate OR|regex rules:
(\.[^\.]*)(\.*$)
- last period to end of string if top domain is >= 3.
(\.[^\.]{0,2})(\.[^\.]{0,2})(\.*$)
- Top and Sub-Domain are <= 2.
var regex_var = new RegExp(/(\.[^\.]{0,2})(\.[^\.]{0,2})(\.*$)|(\.[^\.]*)(\.*$)/);
var unit_test = 'xxx.yy.zz.'.replace(regex_var, '').split('.').pop();
document.write("Returned user entered domain: " + unit_test + "\n");
var result = location.hostname.replace(regex_var, '').split('.').pop();
document.write("Current Domain: " + result);
How about this?
location.hostname.split('.').reverse()[1]
What you want to extract from the URL is not the top-level domain (TLD). The TLD is the rightmost part, e.g. .com.
Having said that, I don't think there's an easy way to do this because there's URLs that have two "common" parts like ".co.uk" and I suppose you don't want to exract the ".co" in those cases. You could maybe use a list of existing two-part "TLDs" to check against so that you know when to extract which part.
function getDomainName( hostname ) {
var TLDs = new RegExp(/\.(com|net|org|biz|ltd|plc|edu|mil|asn|adm|adv|arq|art|bio|cng|cnt|ecn|eng|esp|etc|eti|fot|fst|g12|ind|inf|jor|lel|med|nom|ntr|odo|ppg|pro|psc|psi|rec|slg|tmp|tur|vet|zlg|asso|presse|k12|gov|muni|ernet|res|store|firm|arts|info|mobi|maori|iwi|travel|asia|web|tel)(\.[a-z]{2,3})?$|(\.[^\.]{2,3})(\.[^\.]{2,3})$|(\.[^\.]{2})$/);
return hostname.replace(TLDs, '').split('.').pop();
}
/*** TEST ***/
var domains = [
'domain.com',
'subdomain.domain.com',
'www.subdomain.domain.com',
'www.subdomain.domain.info',
'www.subdomain.domain.info.xx',
'mail.subdomain.domain.co.uk',
'mail.subdomain.domain.xxx.yy',
'mail.subdomain.domain.xx.yyy',
'mail.subdomain.domain.xx',
'domain.xx'
];
var result = [];
for (var i = 0; i < domains.length; i++) {
result.push( getDomainName( domains[i] ) );
}
alert ( result.join(' | ') );
// result: domain | domain | domain | domain | domain | domain | domain | domain | domain | domain
Here's my naive take on solving the issue.
url.split('.').reverse()[1].split('//').reverse()[0]
Supports subdomains, but won't support public suffix SLDs.