Validate a hostname string

2019-01-13 06:27发布

Following up to Regular expression to match hostname or IP Address? and using Restrictions on valid host names as a reference, what is the most readable, concise way to match/validate a hostname/fqdn (fully qualified domain name) in Python? I've answered with my attempt below, improvements welcome.

9条回答
Explosion°爆炸
2楼-- · 2019-01-13 06:42

Process each DNS label individually by excluding invalid characters and ensuring nonzero length.


def isValidHostname(hostname):
    disallowed = re.compile("[^a-zA-Z\d\-]")
    return all(map(lambda x: len(x) and not disallowed.search(x), hostname.split(".")))
查看更多
干净又极端
3楼-- · 2019-01-13 06:48

Here's a bit stricter version of Tim Pietzcker's answer with the following improvements:

  • Limit the length of the hostname to 253 characters (after stripping the optional trailing dot).
  • Limit the character set to ASCII (i.e. use [0-9] instead of \d).
  • Check that the TLD is not all-numeric.
import re

def is_valid_hostname(hostname):
    if hostname[-1] == ".":
        # strip exactly one dot from the right, if present
        hostname = hostname[:-1]
    if len(hostname) > 253:
        return False

    labels = hostname.split(".")

    # the TLD must be not all-numeric
    if re.match(r"[0-9]+$", labels[-1]):
        return False

    allowed = re.compile(r"(?!-)[a-z0-9-]{1,63}(?<!-)$", re.IGNORECASE)
    return all(allowed.match(label) for label in labels)
查看更多
爷、活的狠高调
4楼-- · 2019-01-13 06:49

Complimentary to the @TimPietzcker answer. Underscore is valid hostname, doubel dash is common for IDN punycode. Port number should be stripped. This is the cleanup of the code.

import re
def is_valid_hostname(hostname):
    if len(hostname) > 255:
        return False
    hostname = hostname.rstrip(".")
    allowed = re.compile("(?!-)[A-Z\d\-\_]{1,63}(?<!-)$", re.IGNORECASE)
    return all(allowed.match(x) for x in hostname.split("."))

# convert your unicode hostname to punycode (python 3 ) 
# Remove the port number from hostname
normalise_host = hostname.encode("idna").decode().split(":")[0]
is_valid_hostanme(normalise_host )
查看更多
劳资没心,怎么记你
5楼-- · 2019-01-13 06:49

This pure regex should meet all the parameters: ^(?=.{1,253}\.?$)(?!-)[A-Za-z0-9\-]{1,63}(\.[A-Za-z0-9\-]{1,63})*\.?(?<!-)$

查看更多
我只想做你的唯一
6楼-- · 2019-01-13 06:55
def is_valid_host(host):
    '''IDN compatible domain validator'''
    host = host.encode('idna').lower()
    if not hasattr(is_valid_host, '_re'):
        import re
        is_valid_host._re = re.compile(r'^([0-9a-z][-\w]*[0-9a-z]\.)+[a-z0-9\-]{2,15}$')
    return bool(is_valid_host._re.match(host))
查看更多
倾城 Initia
7楼-- · 2019-01-13 06:56

If you're looking to validate the name of an existing host, the best way is to try to resolve it. You'll never write a regular expression to provide that level of validation.

查看更多
登录 后发表回答