Get second level domain name from URL

2019-03-27 06:03发布

问题:

Is there a way to get top level domain name from the url

for e.g., "https://images.google.com/blah" => "google"

I found this:

var domain = new URL(pageUrl).hostname; 

but it gives me "images.google.com" instead of just google.

Unit tests I have are:

https://images.google.com   => google
https://www.google.com/blah => google
https://www.google.co.uk/blah => google
https://www.images.google.com/blah => google

回答1:

You could do this:

location.hostname.split('.').pop()

EDIT

Saw the change to your question, you would need a list of all TLDs to match against and remove from the hostname, then you could use split('.').pop()

// small example list
var re = new RegExp('\.+(co.uk|me|com|us)')
var secondLevelDomain = 'https://www.google.co.uk'.replace(re, '').split('.').pop()


回答2:

This is the simplest solution besides maintaining white & black top level domain lists.

  1. Match on top level domain if it has two or more characters 'xxxx.yyy'

  2. Match on top level domain and sub-domain, if both are under two characters 'xxxxx.yy.zz'

  3. Remove Match.

  4. Return everything between the last period, and the end of the string.


I broke it into two separate OR|regex rules:

  1. (\.[^\.]*)(\.*$) - last period to end of string if top domain is >= 3.
  2. (\.[^\.]{0,2})(\.[^\.]{0,2})(\.*$) - Top and Sub-Domain are <= 2.

var regex_var = new RegExp(/(\.[^\.]{0,2})(\.[^\.]{0,2})(\.*$)|(\.[^\.]*)(\.*$)/);
var unit_test = 'xxx.yy.zz.'.replace(regex_var, '').split('.').pop();
document.write("Returned user entered domain: " + unit_test + "\n");

var result = location.hostname.replace(regex_var, '').split('.').pop();
document.write("Current Domain: " + result);



回答3:

How about this?

location.hostname.split('.').reverse()[1]



回答4:

What you want to extract from the URL is not the top-level domain (TLD). The TLD is the rightmost part, e.g. .com.

Having said that, I don't think there's an easy way to do this because there's URLs that have two "common" parts like ".co.uk" and I suppose you don't want to exract the ".co" in those cases. You could maybe use a list of existing two-part "TLDs" to check against so that you know when to extract which part.



回答5:

function getDomainName( hostname ) {
    var TLDs = new RegExp(/\.(com|net|org|biz|ltd|plc|edu|mil|asn|adm|adv|arq|art|bio|cng|cnt|ecn|eng|esp|etc|eti|fot|fst|g12|ind|inf|jor|lel|med|nom|ntr|odo|ppg|pro|psc|psi|rec|slg|tmp|tur|vet|zlg|asso|presse|k12|gov|muni|ernet|res|store|firm|arts|info|mobi|maori|iwi|travel|asia|web|tel)(\.[a-z]{2,3})?$|(\.[^\.]{2,3})(\.[^\.]{2,3})$|(\.[^\.]{2})$/);
    return hostname.replace(TLDs, '').split('.').pop();
}

/*** TEST ***/

var domains = [
    'domain.com',
    'subdomain.domain.com',
    'www.subdomain.domain.com',
    'www.subdomain.domain.info',
    'www.subdomain.domain.info.xx',
    'mail.subdomain.domain.co.uk',
    'mail.subdomain.domain.xxx.yy',
    'mail.subdomain.domain.xx.yyy',
    'mail.subdomain.domain.xx',
    'domain.xx'
];

var result = [];
for (var i = 0; i < domains.length; i++) {
    result.push( getDomainName( domains[i] ) );
}

alert ( result.join(' | ') );

// result: domain | domain | domain | domain | domain | domain | domain | domain | domain | domain


回答6:

Here's my naive take on solving the issue.

url.split('.').reverse()[1].split('//').reverse()[0]

Supports subdomains, but won't support public suffix SLDs.