可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

For example, the address is:

Address = http://lol1.domain.com:8888/some/page

I want to save the subdomain into a variable so i could do like so;

print SubAddr
>> lol1

回答1:

urlparse.urlparse will split the URL into protocol, location, port, etc. You can then split the location by . to get the subdomain.

url = urlparse.urlparse(address)
subdomain = url.hostname.split('.')[0]

回答2:

Package tldextract makes this task very easy, and then you can use urlparse as suggested if you need any further information:

>> import tldextract
>> tldextract.extract("http://lol1.domain.com:8888/some/page"
ExtractResult(subdomain='lol1', domain='domain', suffix='com')
>> tldextract.extract("http://sub.lol1.domain.com:8888/some/page"
ExtractResult(subdomain='sub.lol1', domain='domain', suffix='com')
>> urlparse.urlparse("http://sub.lol1.domain.com:8888/some/page")
ParseResult(scheme='http', netloc='sub.lol1.domain.com:8888', path='/some/page', params='', query='', fragment='')

Note that tldextract properly handles sub-domains.

回答3:

Modified version of the fantastic answer here: How to extract top-level domain name (TLD) from URL

You will need the list of effective tlds from here

from __future__ import with_statement
from urlparse import urlparse

# load tlds, ignore comments and empty lines:
with open("effective_tld_names.dat.txt") as tldFile:
    tlds = [line.strip() for line in tldFile if line[0] not in "/\n"]

class DomainParts(object):
    def __init__(self, domain_parts, tld):
        self.domain = None
        self.subdomains = None
        self.tld = tld
        if domain_parts:
            self.domain = domain_parts[-1]
            if len(domain_parts) > 1:
                self.subdomains = domain_parts[:-1]

def get_domain_parts(url, tlds):
    urlElements = urlparse(url).hostname.split('.')
    # urlElements = ["abcde","co","uk"]
    for i in range(-len(urlElements),0):
        lastIElements = urlElements[i:]
        #    i=-3: ["abcde","co","uk"]
        #    i=-2: ["co","uk"]
        #    i=-1: ["uk"] etc

        candidate = ".".join(lastIElements) # abcde.co.uk, co.uk, uk
        wildcardCandidate = ".".join(["*"]+lastIElements[1:]) # *.co.uk, *.uk, *
        exceptionCandidate = "!"+candidate

        # match tlds: 
        if (exceptionCandidate in tlds):
            return ".".join(urlElements[i:]) 
        if (candidate in tlds or wildcardCandidate in tlds):
            return DomainParts(urlElements[:i], '.'.join(urlElements[i:]))
            # returns ["abcde"]

    raise ValueError("Domain not in global list of TLDs")

domain_parts = get_domain_parts("http://sub2.sub1.example.co.uk:80",tlds)
print "Domain:", domain_parts.domain
print "Subdomains:", domain_parts.subdomains or "None"
print "TLD:", domain_parts.tld

Gives you:

Domain: example
Subdomains: ['sub2', 'sub1']
TLD: co.uk

回答4:

What you are looking for is in: http://docs.python.org/library/urlparse.html

for example: ".".join(urlparse('http://www.my.cwi.nl:80/%7Eguido/Python.html').netloc.split(".")[:-2])

Will do the job for you (will return "www.my")

回答5:

A very basic approach, without any sanity checking could look like:

address = 'http://lol1.domain.com:8888/some/page'

host = address.partition('://')[2]
sub_addr = host.partition('.')[0]

print sub_addr

This of course assumes that when you say 'subdomain' you mean the first part of a host name, so in the following case, 'www' would be the subdomain:

http://www.google.com/

Is that what you mean?

回答6:

For extracting the hostname, I'd use urlparse from urllib2:

>>> from urllib2 import urlparse
>>> a = "http://lol1.domain.com:8888/some/page"
>>> urlparse.urlparse(a).hostname
'lol1.domain.com'

As to how to extract the subdomain, you need to cover for the case that there FQDN could be longer. How you do this would depend on your purposes. I might suggest stripping off the two right most components.

E.g.

>>> urlparse.urlparse(a).hostname.rpartition('.')[0].rpartition('.')[0]
'lol1'

回答7:

We can use https://github.com/john-kurkowski/tldextract for this problem...

It's easy.

>>> ext = tldextract.extract('http://forums.bbc.co.uk')
>>> (ext.subdomain, ext.domain, ext.suffix)
('forums', 'bbc', 'co.uk')

Get subdomain from URL using Python

问题:

回答1:

回答2:

回答3:

回答4:

回答5:

回答6:

回答7:

收藏的人(0)

Get subdomain from URL using Python

问题:

回答1:

回答2:

回答3:

回答4:

回答5:

回答6:

回答7:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮