Is there a way to get top level domain name from the url
for e.g., "https://images.google.com/blah" => "google"
I found this:
var domain = new URL(pageUrl).hostname;
but it gives me "images.google.com" instead of just google.
Unit tests I have are:
https://images.google.com => google
https://www.google.com/blah => google
https://www.google.co.uk/blah => google
https://www.images.google.com/blah => google
How about this?
location.hostname.split('.').reverse()[1]
You could do this:
EDIT
Saw the change to your question, you would need a list of all TLDs to match against and remove from the hostname, then you could use
split('.').pop()
Here's my naive take on solving the issue.
Supports subdomains, but won't support public suffix SLDs.
What you want to extract from the URL is not the top-level domain (TLD). The TLD is the rightmost part, e.g. .com.
Having said that, I don't think there's an easy way to do this because there's URLs that have two "common" parts like ".co.uk" and I suppose you don't want to exract the ".co" in those cases. You could maybe use a list of existing two-part "TLDs" to check against so that you know when to extract which part.
This is the simplest solution besides maintaining white & black top level domain lists.
Match on top level domain if it has two or more characters 'xxxx.yyy'
Match on top level domain and sub-domain, if both are under two characters 'xxxxx.yy.zz'
Remove Match.
Return everything between the last period, and the end of the string.
I broke it into two separate OR|regex rules:
(\.[^\.]*)(\.*$)
- last period to end of string if top domain is >= 3.(\.[^\.]{0,2})(\.[^\.]{0,2})(\.*$)
- Top and Sub-Domain are <= 2.