UK Postcode Regex (Comprehensive)

2018-12-31 05:37发布

I'm after a regex that will validate a full complex UK postcode only within an input string. All of the uncommon postcode forms must be covered as well as the usual. For instance:

Matches

  • CW3 9SS
  • SE5 0EG
  • SE50EG
  • se5 0eg
  • WC2H 7LT

No Match

  • aWC2H 7LT
  • WC2H 7LTa
  • WC2H

Are there any official or even semi-official regexes in use for this kind of thing? Any other advice as to formatting and storing these in a database?

28条回答
春风洒进眼中
2楼-- · 2018-12-31 05:45

What i have found in nearly all the variations and the regex from the bulk transfer pdf and what is on wikipedia site is this, specifically for the wikipedia regex is, there needs to be a ^ after the first |(vertical bar). I figured this out by testing for AA9A 9AA, because otherwise the format check for A9A 9AA will validate it. For Example checking for EC1D 1BB which should be invalid comes back valid because C1D 1BB is a valid format.

Here is what I've come up with for a good regex:

^([G][I][R] 0[A]{2})|^((([A-Z-[QVX]][0-9]{1,2})|([A-Z-[QVX]][A-HK-Y][0-9]{1,2})|([A-Z-[QVX]][0-9][ABCDEFGHJKPSTUW])|([A-Z-[QVX]][A-HK-Y][0-9][ABEHMNPRVWXY])) [0-9][A-Z-[CIKMOV]]{2})$
查看更多
裙下三千臣
3楼-- · 2018-12-31 05:46

I'd recommend taking a look at the UK Government Data Standard for postcodes [link now dead; archive of XML, see Wikipedia for discussion]. There is a brief description about the data and the attached xml schema provides a regular expression. It may not be exactly what you want but would be a good starting point. The RegEx differs from the XML slightly, as a P character in third position in format A9A 9AA is allowed by the definition given.

The RegEx supplied by the UK Government was:

([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([A-Za-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9][A-Za-z]?))))\s?[0-9][A-Za-z]{2})

As pointed out on the Wikipedia discussion, this will allow some non-real postcodes (e.g. those starting AA, ZY) and they do provide a more rigorous test that you could try.

查看更多
一个人的天荒地老
4楼-- · 2018-12-31 05:48

here's how we have been dealing with the UK postcode issue:

^([A-Za-z]{1,2}[0-9]{1,2}[A-Za-z]?[ ]?)([0-9]{1}[A-Za-z]{2})$

Explanation:

  • expect 1 or 2 a-z chars, upper or lower fine
  • expect 1 or 2 numbers
  • expect 0 or 1 a-z char, upper or lower fine
  • optional space allowed
  • expect 1 number
  • expect 2 a-z, upper or lower fine

This gets most formats, we then use the db to validate whether the postcode is actually real, this data is driven by openpoint https://www.ordnancesurvey.co.uk/opendatadownload/products.html

hope this helps

查看更多
墨雨无痕
5楼-- · 2018-12-31 05:50

I've been looking for a UK postcode regex for the last day or so and stumbled on this thread. I worked my way through most of the suggestions above and none of them worked for me so I came up with my own regex which, as far as I know, captures all valid UK postcodes as of Jan '13 (according to the latest literature from the Royal Mail).

The regex and some simple postcode checking PHP code is posted below. NOTE:- It allows for lower or uppercase postcodes and the GIR 0AA anomaly but to deal with the, more than likely, presence of a space in the middle of an entered postcode it also makes use of a simple str_replace to remove the space before testing against the regex. Any discrepancies beyond that and the Royal Mail themselves don't even mention them in their literature (see http://www.royalmail.com/sites/default/files/docs/pdf/programmers_guide_edition_7_v5.pdf and start reading from page 17)!

Note: In the Royal Mail's own literature (link above) there is a slight ambiguity surrounding the 3rd and 4th positions and the exceptions in place if these characters are letters. I contacted Royal Mail directly to clear it up and in their own words "A letter in the 4th position of the Outward Code with the format AANA NAA has no exceptions and the 3rd position exceptions apply only to the last letter of the Outward Code with the format ANA NAA." Straight from the horse's mouth!

<?php

    $postcoderegex = '/^([g][i][r][0][a][a])$|^((([a-pr-uwyz]{1}([0]|[1-9]\d?))|([a-pr-uwyz]{1}[a-hk-y]{1}([0]|[1-9]\d?))|([a-pr-uwyz]{1}[1-9][a-hjkps-uw]{1})|([a-pr-uwyz]{1}[a-hk-y]{1}[1-9][a-z]{1}))(\d[abd-hjlnp-uw-z]{2})?)$/i';

    $postcode2check = str_replace(' ','',$postcode2check);

    if (preg_match($postcoderegex, $postcode2check)) {

        echo "$postcode2check is a valid postcode<br>";

    } else {

        echo "$postcode2check is not a valid postcode<br>";

    }

?>

I hope it helps anyone else who comes across this thread looking for a solution.

查看更多
看风景的人
6楼-- · 2018-12-31 05:50

First half of postcode Valid formats

  • [A-Z][A-Z][0-9][A-Z]
  • [A-Z][A-Z][0-9][0-9]
  • [A-Z][0-9][0-9]
  • [A-Z][A-Z][0-9]
  • [A-Z][A-Z][A-Z]
  • [A-Z][0-9][A-Z]
  • [A-Z][0-9]

Exceptions
Position 1 - QVX not used
Position 2 - IJZ not used except in GIR 0AA
Position 3 - AEHMNPRTVXY only used
Position 4 - ABEHMNPRVWXY

Second half of postcode

  • [0-9][A-Z][A-Z]

Exceptions
Position 2+3 - CIKMOV not used

Remember not all possible codes are used, so this list is a necessary but not sufficent condition for a valid code. It might be easier to just match against a list of all valid codes?

查看更多
流年柔荑漫光年
7楼-- · 2018-12-31 05:50

I needed a version that would work in SAS with the PRXMATCH and related functions, so I came up with this:

^[A-PR-UWYZ](([A-HK-Y]?\d\d?)|(\d[A-HJKPSTUW])|([A-HK-Y]\d[ABEHMNPRV-Y]))\s?\d[ABD-HJLNP-UW-Z]{2}$

Test cases and notes:

/* 
Notes
The letters QVX are not used in the 1st position.
The letters IJZ are not used in the second position.
The only letters to appear in the third position are ABCDEFGHJKPSTUW when the structure starts with A9A.
The only letters to appear in the fourth position are ABEHMNPRVWXY when the structure starts with AA9A.
The final two letters do not use the letters CIKMOV, so as not to resemble digits or each other when hand-written.
*/

/*
    Bits and pieces
    1st position (any):         [A-PR-UWYZ]         
    2nd position (if letter):   [A-HK-Y]
    3rd position (A1A format):  [A-HJKPSTUW]
    4th position (AA1A format): [ABEHMNPRV-Y]
    Last 2 positions:           [ABD-HJLNP-UW-Z]    
*/


data example;
infile cards truncover;
input valid 1. postcode &$10. Notes &$100.;
flag = prxmatch('/^[A-PR-UWYZ](([A-HK-Y]?\d\d?)|(\d[A-HJKPSTUW])|([A-HK-Y]\d[ABEHMNPRV-Y]))\s?\d[ABD-HJLNP-UW-Z]{2}$/',strip(postcode));
cards;
1  EC1A 1BB  Special case 1
1  W1A 0AX   Special case 2
1  M1 1AE    Standard format
1  B33 8TH   Standard format
1  CR2 6XH   Standard format
1  DN55 1PT  Standard format
0  QN55 1PT  Bad letter in 1st position
0  DI55 1PT  Bad letter in 2nd position
0  W1Z 0AX   Bad letter in 3rd position
0  EC1Z 1BB  Bad letter in 4th position
0  DN55 1CT  Bad letter in 2nd group
0  A11A 1AA  Invalid digits in 1st group
0  AA11A 1AA  1st group too long
0  AA11 1AAA  2nd group too long
0  AA11 1AAA  2nd group too long
0  AAA 1AA   No digit in 1st group
0  AA 1AA    No digit in 1st group
0  A 1AA     No digit in 1st group
0  1A 1AA    Missing letter in 1st group
0  1 1AA     Missing letter in 1st group
0  11 1AA    Missing letter in 1st group
0  AA1 1A    Missing letter in 2nd group
0  AA1 1     Missing letter in 2nd group
;
run;
查看更多
登录 后发表回答