How I can parse a domain from URL in PHP? It seems that I need a country domain database.
Examples:
http://mail.google.com/hfjdhfjd/jhfjd.html -> google.com
http://www.google.bg/jhdjhf/djfhj.html -> google.bg
http://www.google.co.uk/djhdjhf.php -> google.co.uk
http://www.tsk.tr/jhjgc.aspx -> tsk.tr
http://subsub.sub.nic.tr/ -> nic.tr
http://subsub.sub.google.com.tr -> google.com.tr
http://subsub.sub.itoy.info.tr -> itoy.info.tr
Can it be done with whois request?
Edit: There are few domain names with .tr
(www.nic.tr
, www.tsk.tr
) the others are as you know: www.something.com.tr
, www.something.org.tr
Also there is no www.something.com.bg
, www.something.org.bg
. They are www.something.bg
like the Germans' .de
But there are www.something.a.bg
, www.something.b.bg
thus a.bg
, b.bg
, c.bg
and so on. (a.bg
is like co.uk
)
There on the net must be list of these top domain names.
Check how is coloured the url http://www.agrotehnika97.a.bg/
in Internet Explorer.
Check also
www.google.co.uk<br>
www.google.com.tr<br>
www.nic.tr<br>
www.tsk.tr
I reckon you'll need a list of all suffixes used after a domain name. http://publicsuffix.org/list/ provides an up-to-date (or so they claim) of all suffixes in use currently. The list is actually here Now the idea would be for you to parse up that list into a structure, with different levels split by the dot, starting by the end levels:
so for instance for the domains: com.la com.tr com.lc
you'd end up with:
etc...
Then you'd get the host from base_url (by using parse_url), and you'd explode it by dots. and you start matching up the values against your structure, starting with the last one:
so for google.com.tr you'd start by matching tr, then com, then you won't find a match once you get to google, which is what you want...
You can use
parse_url()
to split it up and get what you want. Here's an example...Will echo...
Regex and parse_url() aren't solution for you.
You need package that uses Public Suffix List, only in this way you can correctly extract domains with two-, third-level TLDs (co.uk, a.bg, b.bg, etc.). I recomend use TLD Extract.
Here example of code:
The domain is stored in
$_SERVER['HTTP_HOST']
.EDIT: I believe this returns the whole domain. To just get the top-level domain, you could do this: