Validate a hostname string

2019-01-13 06:27发布

Following up to Regular expression to match hostname or IP Address? and using Restrictions on valid host names as a reference, what is the most readable, concise way to match/validate a hostname/fqdn (fully qualified domain name) in Python? I've answered with my attempt below, improvements welcome.

9条回答
甜甜的少女心
2楼-- · 2019-01-13 06:57

I like the thoroughness of Tim Pietzcker's answer, but I prefer to offload some of the logic from regular expressions for readability. Honestly, I had to look up the meaning of those (? "extension notation" parts. Additionally, I feel the "double-negative" approach is more obvious in that it limits the responsibility of the regular expression to just finding any invalid character. I do like that re.IGNORECASE allows the regex to be shortened.

So here's another shot; it's longer but it reads kind of like prose. I suppose "readable" is somewhat at odds with "concise". I believe all of the validation constraints mentioned in the thread so far are covered:


def isValidHostname(hostname):
    if len(hostname) > 255:
        return False
    if hostname.endswith("."): # A single trailing dot is legal
        hostname = hostname[:-1] # strip exactly one dot from the right, if present
    disallowed = re.compile("[^A-Z\d-]", re.IGNORECASE)
    return all( # Split by labels and verify individually
        (label and len(label) <= 63 # length is within proper range
         and not label.startswith("-") and not label.endswith("-") # no bordering hyphens
         and not disallowed.search(label)) # contains only legal characters
        for label in hostname.split("."))
查看更多
小情绪 Triste *
3楼-- · 2019-01-13 07:01

Per The Old New Thing, the maximum length of a DNS name is 253 characters. (One is allowed up to 255 octets, but 2 of those are consumed by the encoding.)

import re

def validate_fqdn(dn):
    if dn.endswith('.'):
        dn = dn[:-1]
    if len(dn) < 1 or len(dn) > 253:
        return False
    ldh_re = re.compile('^[a-z0-9]([a-z0-9-]{0,61}[a-z0-9])?$',
                        re.IGNORECASE)
    return all(ldh_re.match(x) for x in dn.split('.'))

One could argue for accepting empty domain names, or not, depending on one's purpose.

查看更多
\"骚年 ilove
4楼-- · 2019-01-13 07:04
import re
def is_valid_hostname(hostname):
    if len(hostname) > 255:
        return False
    if hostname[-1] == ".":
        hostname = hostname[:-1] # strip exactly one dot from the right, if present
    allowed = re.compile("(?!-)[A-Z\d-]{1,63}(?<!-)$", re.IGNORECASE)
    return all(allowed.match(x) for x in hostname.split("."))

ensures that each segment

  • contains at least one character and a maximum of 63 characters
  • consists only of allowed characters
  • doesn't begin or end with a hyphen.

It also avoids double negatives (not disallowed), and if hostname ends in a ., that's OK, too. It will (and should) fail if hostname ends in more than one dot.

查看更多
登录 后发表回答