How to correctly match UK postcodes by prefix?

2020-02-13 00:00发布

问题:

I have a number of restaurants who all deliver to certain postcode areas in London, for example:

  • EC1
  • WC1
  • WC2
  • W1

When someone searches for a restaurant that delivers to their home, they enter their full postcode.

Some people enter the postcode correctly with the space, some of them just enter all letters and numbers attached, without a space separator. To harmonize things, I remove any space in the postcode before attempting a match.

So far, I used to match the postcode to the prefixes by just checking if it starts with the prefix in question, but then I realized that this is not foolproof:

  • WC1E123 => correct match for WC1
  • W1ABC => correct match for W1
  • W10ABC => incorrect match for W1, should only match the W10 prefix

How can I know, given a full postcode with no space, if it matches a given prefix, while not failing the W1 / W10 test above?

Is there any solution at all to the problem, that would not involve forcing the customer to enter the postcode with the space at the correct position?

回答1:

There are 6 possible formats for postcodes in the UK:

  • A9 9AA
  • A9A 9AA
  • A99 9AA
  • AA9 9AA
  • AA9A 9AA
  • AA99 9AA

I think there need to be two parts to your solution. The first is to validate the input; the second is to grab that first part.

Validation

This is really important, even though I realise you have said this is not what you are trying to do, but without it you are going to struggle to get the right prefix and possibly send your drivers to the wrong place!

There are a couple of ways you can do it, either use a 3rd party to help you capture a complete & correct address (many available including http://www.qas.co.uk/knowledge-centre/product-information/address-postcode-finder.htm (my company)), or at a minimum use some reg-ex / similar sanity testing to validate the postcodes - such as the links Dmitri gave you above.

If you look at the test cases you have listed - W1ABC and W10ABC are not valid postcodes - if we get that bit correct then the next bit becomes a lot easier.

Extract the Prefix

Assuming you now have a full, valid postcode getting just the first part (outcode) becomes a lot easier - with or without spaces. Because the second half (incode) has a standard format of 9AA, digit-alpha-alpha, I would do it by spotting and removing this, leaving you with just your outcode whether it be W1 From W1 0AA, or W10 from W10 0AA.

Alternatively, if you are using a 3rd party to capture the address - most of them will be able to return the incode and outcode separately for you.



回答2:

The below graphic explains the format of UK postcodes:

Source: https://www.getthedata.com/postcode (My site) So you can see that you need Outcode which given your requirement (given a full postcode with no space) is simply your space-free postcode minus the last three characters.

In PHP this would be:

$outcode = substr($postcode_no_space, 0, -3)

Of course this does not help with validating the postcode, but as you point out in your comments the question is not about validation.



回答3:

I use the following regex which matches the prefix part only but uses a lookahead to make sure the full postcode is valid (including an optional space)

(GIR|[A-PR-UWYZ]([0-9]{1,2}|([A-HK-Y][0-9]|[A-HK-Y][0-9]([0-9]|[ABEHMNPRV-Y]))|[0-9][A-HJKS-UW]))(?=( )?[0-9][ABD-HJLNP-UW-Z]{2})

It's not quite perfect as it will match some postcodes that aren't valid (eg starting AA, etc) but if you're using it to look up the prefix anyway it should do the trick.

ps. I just noted that the regex supplied by the UK Government has been updated since I first implemented this. I which case this can be updated to:

(GIR|([A-Z-[QVX][0-9][0-9]?)|(([A-Z-[QVX][A-Z-[IJZ][0-9][0-9]?)|(([A-Z-[QVX][0-9][A-HJKSTUW])|([A-Z-[QVX][A-Z-[IJZ][0-9][ABEHMNPRVWXY]))))(?=( )?[0-9][A-Z-[CIKMOV]{2})


回答4:

In php I do

$first=trim(substr(trim($postcode),0,-3));

To get the first section of the postcode. I've been using it for years and just works. It doesn't matter whether the user includes the space (or 2 spaces) in middle, because the last section is always 3 characters. I work for a distribution company, and we get charged more for certain postcode areas. You will have a problem is somebody enters their postcode incorrectly, if they miss a character from the end.

If the above isn't good enough.

You can validate whether the postcode the user gave you is valid, then http://postcodes.io/ can help.

http://api.postcodes.io/postcodes/W11%202AQ will give you back some JSON with whether the postcode is valid.

{
    "status": 200,
    "result": {
        "postcode": "W11 2AQ",
        "quality": 1,
        "eastings": 524990,
        "northings": 181250,
        "country": "England",
        "nhs_ha": "London",
        "longitude": -0.200056238526337,
        "latitude": 51.5163540527233,
        "parliamentary_constituency": "Kensington",
        "european_electoral_region": "London",
        "primary_care_trust": "Kensington and Chelsea",
        "region": "London",
        "lsoa": "Kensington and Chelsea 004A",
        "msoa": "Kensington and Chelsea 004",
        "nuts": "Colville",
        "incode": "2AQ",
        "outcode": "W11",
        "admin_district": "Kensington and Chelsea",
        "parish": "Kensington and Chelsea, unparished area",
        "admin_county": null,
        "admin_ward": "Colville",
        "ccg": "NHS West London (Kensington and Chelsea, Queenís Park and Paddington)",
        "codes": {
            "admin_district": "E09000020",
            "admin_county": "E99999999",
            "admin_ward": "E05009392",
            "parish": "E43000210",
            "ccg": "E38000202"
        }
    }
}

Part of the JSON is an "outcode": "W11", which I think is exactly what you are looking for.

You could also use the "eastings":524990,"northings":181250, fields to calculate the straight line distance from the restaurant to the user. The units are metres. Use Pythagoras.



回答5:

Since you can compute the length of the postcode the customer entered, and the formats for the postcodes always have 9AA at the end, you could break the code down into a few cases and return matches by doing the following

firstPart -> postcode with last 3 characters removed
firstPartLength -> length of firstPart
switch (firstPartLength){
    case 2:
        code to compare prefix against A99AA format
    case 3:
        code to compare prefix against A9A9AA, A999AA, AA99AA format
    case 4:
        code to compare prefix against AA999AA format

or if you don't want to truncate the last 3 characters,

length -> length of postcode
switch (length){
    case 5:
        code to compare prefix against A99AA format
    case 6:
        code to compare prefix against A9A9AA, A999AA, AA99AA format
    case 7:
        code to compare prefix against AA999AA format


回答6:

Given the assumption that every postcode ends in 9AA and every input postcode is valid, the following regex could be used to match the area prefix:

^(\w{2,4})\s*[0-9][a-zA-Z]{2}$

The first capturing group returns the wanted prefix.