regular expression for IPv6 addresses

2019-05-10 00:06发布

问题:

I have a regular expression for IPv6 addresses as given below

IPV4ADDRESS      [ \t]*(([[:digit:]]{1,3}"."){3}([[:digit:]]{1,3}))[ \t]*
x4               ([[:xdigit:]]{1,4})
xseq             ({x4}(:{x4}){0,7})
xpart            ({xseq}|({xseq}::({xseq}?))|::{xseq})
IPV6ADDRESS      [ \t]*({xpart}(":"{IPV4ADDRESS})?)[ \t]*

It is correctly all formats of IPv6 addresses including

1) non-compressed IPv6 addresses
2) compressed IPv6 addresses
3) IPv6 addresses in legacy formats.(supporting IPv4)

Ideal examples of IPv6 addresses in legacy formats would be

2001:1234::3210:5.6.7.8

     OR
2001:1234:1234:5432:4578:5678:5.6.7.8

As you can see above there are 10 groups separated by either `":" or ".".`

As opposed to 8 groups in normal IPv6 addresses.This is because the last 4 groups that are separated by `"." should be compressed into least significant 32-bits of the IPv6 addresses.Hence we need 10 groups to satisfy 128 bits.

However If I use the following address format

   2001:1234:4563:3210:5.6.7.8

Here each group separated by ":" represents 16-bits.the last four groups separted by "." represents 8 bits.Total number of bits is 64 + 32 = 96 bits.32 bits are missing

The regular expression is accepting it as a valid IPv6 address format.I am unable to figure out how to fix the regular expression to discard such values.Any help is highly appreciated.

回答1:

Here's the grammar for IPv6 addresses as given in RFC 3986 and subsequently affirmed in RFC 5954:

 IPv6address   =                             6( h16 ":" ) ls32
                /                       "::" 5( h16 ":" ) ls32
                / [               h16 ] "::" 4( h16 ":" ) ls32
                / [ *1( h16 ":" ) h16 ] "::" 3( h16 ":" ) ls32
                / [ *2( h16 ":" ) h16 ] "::" 2( h16 ":" ) ls32
                / [ *3( h16 ":" ) h16 ] "::"    h16 ":"   ls32
                / [ *4( h16 ":" ) h16 ] "::"              ls32
                / [ *5( h16 ":" ) h16 ] "::"              h16
                / [ *6( h16 ":" ) h16 ] "::"

 h16           = 1*4HEXDIG
 ls32          = ( h16 ":" h16 ) / IPv4address
 IPv4address   = dec-octet "." dec-octet "." dec-octet "." dec-octet
 dec-octet     = DIGIT                 ; 0-9
                / %x31-39 DIGIT         ; 10-99
                / "1" 2DIGIT            ; 100-199
                / "2" %x30-34 DIGIT     ; 200-249
                / "25" %x30-35          ; 250-255

Using this, we can build a standards-compliant regular expression for IPv6 addresses.

dec_octet      ([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])
ipv4address    ({dec_octet}"."){3}{dec_octet}
h16            ([[:xdigit:]]{1,4})
ls32           ({h16}:{h16}|{ipv4address})
ipv6address    (({h16}:){6}{ls32}|::({h16}:){5}{ls32}|({h16})?::({h16}:){4}{ls32}|(({h16}:){0,1}{h16})?::({h16}:){3}{ls32}|(({h16}:){0,2}{h16})?::({h16}:){2}{ls32}|(({h16}:){0,3}{h16})?::{h16}:{ls32}|(({h16}:){0,4}{h16})?::{ls32}|(({h16}:){0,5}{h16})?::{h16}|(({h16}:){0,6}{h16})?::)

Disclaimer: untested.