Given a URL, how do I extract the registered domain using the Public Suffix List (list of effective TLDs, e.g. this list)?
For instance, considering a.bg
is a valid public suffix:
http://www.test.start.a.bg/hello.html -> start.a.bg
http://test.start.a.bg/ -> start.a.bg
http://test.start.abc.bg/ -> abc.bg (.bg is the public suffix)
This cannot be done using simple string manipulation because the public suffix can consist of multiple levels depending on the TLD.
P.S. It doesn't matter how I read the list (database or flat file), but the list should be accessible locally so I'm not always dependent on external services.
This question is a bit old, but there's a new solution: https://github.com/jeremykendall/php-domain-parser
This library does exactly what you want. Here's the setup:
This will print
"scottwills.co.uk"
.You can use
parse_url()
to extract the hostname, then use the library provided by regdom to determine the registered domain name (dn + eTLD). For example:That will print out
metu.edu.tr
.Other examples I've tried:
UPDATE: These libraries have been moved to: https://github.com/leth/registered-domains-php
I recomend to use TLDExtract, it has regurly updatable database that generated from PSL.