☑ That's Good Enough
Gilding lilies might be fun, but gold leaf doesn't grow on trees.

Blog

Site

Me

☑ Validating UK Postcodes

18 Aug 2015 at 6:15PM in Software

｜

Photo by Alex Rodriguez Santibanez

｜

See comments

regex string-manipulation

I find myself needing to enter UK addresses on a fairly regular basis and it never fails to amaze me how poor some of the syntax checking is - basic validation of a UK postcode is really not even remotely difficult.

In these days of online shopping, online account access and, in short, online more or less everything, it’s common to have to enter your address into a web page. Or perhaps someone else’s address. This isn’t usually too taxing, especially with browsers providing auto-complete these days, but one quirk of many of these pages that I find disproportionately irritating is the appalling and inconsistent rules they apply to postcodes.

Most countries have postal codes of some sort — America has zip codes, Ireland has Eircode, Germany has its Postleitzahlen and so on. Most of these systems are fairly simple and the UK is no exception. It’s therefore a continual source of confusion to me how any web developer could possibly manage to mess things up — but they appear to do so, repeatedly, at least for UK postcodes (I can’t speak for the others). Perhaps by explaining things here I might save someone from making the same mistakes — probably not, but at least it gives me the moral highground to rant about it, which is probably the most important thing, right?

A UK postcode is split into two halves, conventionally separated by a space. The first half is the outward code and most commonly consists of 1-2 letters followed by 1-2 digits. The letters specify the area — there are currently 121 of these in Britain¹. The digits specify the district within the area, typically with 1 indicating the centre of a city and moving outwards. There’s also an additional complication in London (and perhaps other densely populated areas) that the district becomes too large for use, so an additional letter may be appended to the end of the outward code to further subdivide it.

The part of the postcode after the space is called the inward code and always consists of one digit, specifying a sector within the district which typically narrows it down to around a few thousand addresses, followed by two letters, which specify the unit within the sector which takes it all the way down to the low tens of addresses.

As you might have guessed the names of the two halves reflect the differing uses: the outward code is used to decide which sorting office to send post to for further processing; once it’s arrived there, the inward code is used to determine the specific address to which it should be delivered.

There are some exceptions to these rules, such as for overseas territories or the special code SAN TA1 for Father Christmas. However, unless you’re setting up the Pitcairn Island Delivery Service, you probably don’t need to worry too much about these.

Not that people writing postcode parsing code need to worry about most of this, to be honest — frankly the main take-away from all of the discussion so far is simply this regular expression:

^[A-Z]{1,2}[0-9]{1,2}[A-Z]? [0-9][A-Z]{2}$

That said, I really hope someone about to implement a postcode validator doesn’t stop here beacuse this regular expression demonstrates three of my recurrent annoyances with postcode validators online.

Case-sensitivity: Postcodes are typically presented in upper-case, but only the most petty pedant could possibly claim that sw1a 1aa is incorrect. Don’t be needlessly fussy, allow any combination of case.
Spaces #1: Postcodes typically contain a space, but don’t be a sadist and reject postcodes without one. For bonus marks, strip out anything that isn’t an alphanumeric character before you do any other processing.
Spaces #2: Postcodes typically contain a space, so don’t be an idiot and reject postcodes that contain one or more of them, and don’t be pointlessly fussy if they’re in the wrong place either. For bonus marks, see the previous item.

If you were in Python, therefore, to avoid falling into any of these traps you could easily do the following — it’s quick and dirty and uses regular expressions, but it shows how simple the job is:

#!python
import re

NON_ALPHA_RE = re.compile('[^A-Z0-9]+')
POSTCODE_RE = re.compile('^[A-Z]{1,2}[0-9]{1,2}[A-Z]? [0-9][A-Z]{2}$')

def normalise_postcode(postcode):
    """Return a normalised postcode if valid, or None if not."""

    postcode = NON_ALPHA_RE.sub('', postcode.upper())
    postcode = postcode[:-3] + ' ' + postcode[-3:]
    if POSTCODE_RE.match(postcode):
        return postcode
    return None

Now how simple is that? Took me all of ten minutes. Next time you find a website that rejects a postcode that it could have quite easily figured out, or accepts a postcode that’s quite clearly gibberish, just have a little think about how absurdly little time the developer responsible spent on the user experience and perhaps leave some feedback to that effect.

My last annoyance with postcode validators isn’t demonstrable here, but it occurs when people try to do the right thing and validate against the actual postcode database. Unfortunately these people are often rather lackadaisical about keeping it updated so if you live somewhere which was built in the last year or two, you can often kiss goodbye to the thought of ordering anything to your home address. Not a problem in our current house, unless there’s someone somehow running off a postcode database that predates the computer by several decades² — but having lived at such a place in the past I can attest to it as an almost limitless source of frustration.

Therefore, don’t reject something just because it’s not in the postcode database — and just pay the damn money to keep it up to date, will you? Because no matter how good the thing you’re selling, there’s someone else selling something almost indistinguishable, and they’ll let me order it sucessfully online.

In fact, forget all of the above and just don’t do any postcode validation at all. Let the Royal Mail sort it out, it’s what they get paid for.

The crown dependencies of Guernsey, Jersey and Isle of Man also have two-letter codes that look like postcodes, but I’m not sure if they follow precisely the same scheme. ↩
And I wouldn’t put it past them! ↩

18 Aug 2015 at 6:15PM in Software

｜

Photo by Alex Rodriguez Santibanez

｜

See comments

regex string-manipulation