Get the subdomain from a URL

2019-01-01 07:09发布

Getting the subdomain from a URL sounds easy at first.

http://www.domain.example

Scan for the first period then return whatever came after the "http://" ...

Then you remember

http://super.duper.domain.example

Oh. So then you think, okay, find the last period, go back a word and get everything before!

Then you remember

http://super.duper.domain.co.uk

And you're back to square one. Anyone have any great ideas besides storing a list of all TLDs?

15条回答
与君花间醉酒
2楼-- · 2019-01-01 07:33

I just wrote a objc library : https://github.com/kejinlu/KKDomain

查看更多
听够珍惜
3楼-- · 2019-01-01 07:39

Having taken a quick look at the publicsuffix.org list, it appears that you could make a reasonable approximation by removing the final three segments ("segment" here meaning a section between two dots) from domains where the final segment is two characters long, on the assumption that it's a country code and will be further subdivided. If the final segment is "us" and the second-to-last segment is also two characters, remove the last four segments. In all other cases, remove the final two segments. e.g.:

"example" is not two characters, so remove "domain.example", leaving "www"

"example" is not two characters, so remove "domain.example", leaving "super.duper"

"uk" is two characters (but not "us"), so remove "domain.co.uk", leaving "super.duper"

"us" is two characters and is "us", plus "wy" is also two characters, so remove "pvt.k12.wy.us", leaving "foo".

Note that, although this works for all examples that I've seen in the responses so far, it remains only a reasonable approximation. It is not completely correct, although I suspect it's about as close as you're likely to get without making/obtaining an actual list to use for reference.

查看更多
不流泪的眼
4楼-- · 2019-01-01 07:40

List of common suffixes (.co.uk, .com, et cetera) to strip out along with the http:// and then you'll only have "sub.domain" to work with instead of "http://sub.domain.suffix", or at least that's what I'd probably do.

The biggest problem is the list of possible suffixes. There's a lot, after all.

查看更多
登录 后发表回答