Getting the subdomain from a URL sounds easy at first.
http://www.domain.example
Scan for the first period then return whatever came after the "http://" ...
Then you remember
http://super.duper.domain.example
Oh. So then you think, okay, find the last period, go back a word and get everything before!
Then you remember
http://super.duper.domain.co.uk
And you're back to square one. Anyone have any great ideas besides storing a list of all TLDs?
Publicsuffix.org seems the way to do. There are plenty of implementations out there to parse the contents of the publicsuffix data file file easily:
As already said by Adam and John publicsuffix.org is the correct way to go. But, if for any reason you cannot use this approach, here's a heuristic based on an assumption that works for 99% of all domains:
There is one property that distinguishes (not all, but nearly all) "real" domains from subdomains and TLDs and that's the DNS's MX record. You could create an algorithm that searches for this: Remove the parts of the hostname one by one and query the DNS until you find an MX record. Example:
Here is an example in php:
Use the URIBuilder then get the URIBUilder.host attribute split it into an array on "." you now have an array with the domain split out.
Just wrote a program for this in clojure based on the info from publicsuffix.org:
https://github.com/isaksky/url_dom
For example:
You can use this lib tld.js: JavaScript API to work against complex domain names, subdomains and URIs.
If you are getting root domain in browser. You can use this lib AngusFu/browser-root-domain.
Using cookie is tricky.