☑ Fighting Fonts on Mobile

I recently ran into some odd font sizing issues when viewing my website on my iPhone and discovered a few interesting tidbits about rendering on mobile browsers along the way.

Recently I found the time to finally get around to some tweaks to my website theme that I’d been meaning to make for some time — primarily these were related to syntax highlighting and code snippets.

Handling code snippets in HTML has some surprising subtleties, one of them being the overflow behaviour. As much as you try to wrap everything at a sensible number of columns, there will always be times when you absolutely need to represent a long line and it’s always rather unsatisfactory to have to explain that the line breaks were just added for readability. As a result, the styling needs a graceful fallback for these inevitable cases.

I considered using white-space: pre-wrap, which acts like <pre> except that it wraps text on overflow as well as explicit line breaks. One complication, however, is that I sometimes use line numbering for my longer code snippets:

1
2
3
This isn't actually very long.
So the line numbers are rather pointless.
But you get the idea.

To facilitate easy cut and paste this is actually a HTML table with a single row and the line numbers and file content in their own cells, contained within nested <pre> tags1. This is handy, but it does lead to some odd formatting issues, since there’s nothing explicitly aligning the individual lines in the two elements except for the fact that they’re using a consistent font and line width.

One issue I’ve had in the past, for example, is when I used bold text for some of the syntax highlighting styles — I found that some browsers would adjust the line height when this happened such that the rows no longer quite lined up after that point. I tried various line-height fixes with limited success, but eventually it was just easiest to avoid bold text in code snippets.

Another issue concerns overflows — if you wrap text in the file content as I suggested earlier then you’d need to also somehow arrange a gap (or repeat) in the line numbering or all the line numbers will be off by one for each wrapped line. There’s no practical way to arrange this except by perhaps putting each code row in its own table row and modifying third party code extensively to do that just didn’t appeal for a quick fix.

Instead, therefore, I opted for using overflow: auto, which inserts scrollbars as required, combined with judicious max-width: 100% here and there. I was pleasantly surprised to see this works on any sensible browser2.

However, when I tested the site on my iPhone I discovered a new and puzzling issue: the overflow scrolling was working fine, but for some reason when the text overflowed the font size of the line numbers was oddly reduced and hence no longer aligned with the code.

I realised fairly early on that this was some odd resizing issue due to the content being too large, and hence presumably an attempt to fit more text into place getting confused with some of the CSS rules — but no amount of font-size, width, height or anything else seemed to fix it, even with !important littered all over the place.

This threatened to be one of those annoying issues that can be very tough to track down — but luckily for me I stumbled upon the solution fairly quickly. As it turns out, it’s all about how browsers on mobile devices render pages.

The issue with smartphone browsers is that most web content authors didn’t design their content and/or styles to cope with such a small viewport. All kinds of web pages don’t render correctly — the kind of hacks that are required to get cross-browser compatibility almost certainly don’t help, either. As a result of this the browsers use a bit of a trick to get things looking a little more natural.

What they do is render the page to a larger viewport (e.g. 1000 pixels wide) and then scale the page down to fit on the screen. This allows the styles to render more plausibly on most pages, but it does tend to make the text awfully hard to read. This would entail scrolling the screen left/right to read a full line of text which would be a trifle vexing to say the least.

To get around this issue the browsers inflate the text — they scale the font size up so that it becomes legible once again, whilst still leaving all the pixel-oriented sizes intact.

As it turns out, this was happening with my code snippets — but for some reason a different inflation ratio was applied to the line numbers element than the code. Once I knew that this was the issue it was quite easy to fix3 by adding -webkit-text-size-adjust: 100% to my assorted <pre> elements — apparently this did the trick as it now seems to work. Check out that linked page because it’s got a lot of additional useful details which isn’t Mozilla-specific.

There you are, then — who knew rendering pages on mobile was such a subtle business? Actually, given the usual state of all but the simplest CSS tasks I’m quite surprised it was as straightforward as it was.


  1. I can’t really take either credit or blame for this, it’s part of the functionality provided by the Pygments package that I use for syntax highlighting. The documentation indicates that it’s optional, but I think it’s useful. 

  2. sensible, adj.: Any popular web browser which isn’t IE

  3. Perhaps not fix optimally, but at least fix. Work around. Hack. Whatever you want to call it. 

Thu 08 Oct 2015 at 08:10PM by Andy Pearce in Software tagged with fonts and web  |  See comments

☑ C++ Streaming Pains

C++ streams have their advantages, but in some ways they’re a downright pain.

Some time ago I wrote a post called The Dark Arts of C++ Streams. The name was a reference to the fact that, as a C programmer who later learned C++, I’d always regarded them with some suspicion and disdain as a much more cumbersome alternative to the likes of printf(). Of course, streams are much more suited to C++ where behaviours may be extended and overridden as needed to support user-specified types, and the output system needs to support that — but for the common cases of printing formatted output of strings, integers and floats, printf() just seemed much more convenient.

Since writing that post I’ve become rather more familiar with streams, and even grown fond of some of their quirks — their flexibility is useful for things like debugging and their comparative verbosity has lost much of its sting. There are still some things that bug me about streams, however — mostly a set of related traps that are easy to fall into and could, to my mind at least, have easily been avoided.

The main issue I have rests on the statefulness of the formatting. Let’s say you want to output a floating point value to 3 decimal places, and then another to 5 decimal places. Each time you need to set the precision on the stream and not on the value being streamed out. This is possible with explicit methods on the stream:

std::cout.setf(std::ios::fixed);
std::cout.precision(3);
std::cout << 123.456789 << std::endl;  // 123.456

However, this is rather awkward — one of the big syntactic advantages of streams is the way that they can be chained. Taking the previous example of two floats, you’d like to be able to do something like this:

std::cout << first << " and " << second << std::endl;

Indeed you can do that — but now add some formatting using the method-based approach earlier and it’s absolutely horrible:

std::cout.setf(std::ios::fixed);
std::cout.precision(3);
std::cout << first << " and ";
std::cout.precision(5);
std::cout << second << std::endl;

Of course, as anyone who’s done this sort of formatting will probably know, it is possible to do the whole thing inline:

std::cout << std::fixed
          << std::setprecision(3) << first
          << " and "
          << std::setprecision(5) << second
          << std::endl;

That’s conceptually a lot nicer, but the spanner in the works is the fact that these functions are just a fancy shortcut for calling the methods on the stream itself. In other words the formatting changes they make are not scoped in any way — they persist on the stream until explicitly replaced.

This could lead to all sorts of nasty consequences. Imagine code like this:

// Add a trace comment to the output file in a debug build.
void traceFloat(std::ostream& out, std::string name, double value)
{
#ifdef DEBUG
    out << "# The value of '" << value << "' is "
        << std::fixed << std::setprecision(16) << value
        << std::endl;
#endif /* ifdef DEBUG */
}

// ... Later on in the source file...

void myFunction()
{
    double foo_value = getFooValue();
    // ...
    myFile.precision(3);
    traceFloat(myFile, "foo", foo_value);
    myFile << "foo:" << foo_value << "\n";
    // ...
}

Here we have some routine which is generating some sort of formatted file, but in a debug build (only) it also adds some comments for diagnostics. Except in this particular case, the act of adding the diagnostic actually changes the output format of the file — the use of setprecision() in the traceFloat() function overrides the earlier setting of precision() within myFunction(). This is something that’s just begging to cause those annoying problems that only materialise in a production environment.

This is a fairly contrived example, but this sort of action at a distance is really poor practice. Of course all of this is documented, but an important property of a programming language (and I always include standard libraries in that term) is how easily programmers can avoid or detect both current and potential future mistakes.

All this said, that’s the way it works and it’s unlikely to change in any fundamental manner — so is there a way to use streams safely? Well, as is so often the case in C++ RAII comes to the rescue.

It’s possible to capture the current state of the stream and restore it later, which is probably best practice for anyone using streams for any purpose where the format is critical.

It’s not hard to build a class which uses an RAII approach to save and later restore the current state of the stream:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
class StreamStateGuard
{
  public:
    StreamStateGuard(std::ios_base& stream) : stream_(stream)
    {
        captureStreamState();
    }

    ~StreamStateGuard()
    {
        restoreStreamState();
    }

    void captureStreamState()
    {
        precision_ = stream_.precision();
        width_ = stream_.width();
        flags_ = stream_.flags();
    }

    void restoreStreamState()
    {
        stream_.precision(precision_);
        stream_.width(width_);
        stream_.flags(flags_);
    }

  private:
    std::ios_base& stream_;
    std::streamsize precision_;
    std::streamsize width_;
    std::ios_base::fmtflags flags_;
};

This is for illustrative purposes and I haven’t tested it extensively, so there may be additional state to save I’ve forgotten or exceptions I haven’t handled, but it’s pretty simple. If you’re lucky enough to be able to use Boost in your C++ environment then it already has classes to do all this for you.

Let’s say you’re writing some code to output formatted values in some defined text file format — you could write a function to output a row which saves the state at the start of the function and restores it at the end. However, wouldn’t it be nice if each field could do that itself? That would keep things modular, and it’s something we can do by writing new IO manipulators. I should mention here that this is something I’ve done little of, so there may be more elegant approaches, but the code below seems to work:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
class MyClass
{
  public:
    MyClass(double value1, double value2)
            : value1_(value1), value2_(value2)
    { }

    void writeValues(std::ostream& out);

  private:
    class outputField
    {
      public:
        outputField(double value, int decimal_places, int width)
                : value_(value),
                  decimal_places_(decimal_places),
                  width_(width)
        { }
        std::ostream& operator()(std::ostream& out) const;

      private:
        double value_;
        int decimal_places_;
        int width_;
    };

    friend std::ostream& operator<<(
            std::ostream& out,
            const MyClass::outputField& item);

    double value1_;
    double value2_;
};

std::ostream& operator<<(std::ostream& out, const MyClass::outputField& item)
{
    return item(out);
}

std::ostream& MyClass::outputField::operator()(std::ostream& out) const
{
    StreamStateGuard streamStateGuard(out);

    return out << std::fixed << std::noshowpoint
               << std::setprecision(decimal_places_)
               << std::setw(width_)
               << value_;
}

void MyClass::writeValues(std::ostream& output)
{
    output << "MyClass|"
           << outputField(value1_, 2, 8) << "|"
           << outputField(value2_, 6, 12) << "|\n";
}

The sample MyClass has only two double values to output, but this example could easily be extended to more or less any types.

These sorts of techniques start to illustrate what I’m increasingly learning as the best practices to keep a programmer sane when using C++ IO streams. I still have a soft spot for the old stdio.h functions, but I suppose all of us have to be dragged into the future at some point — at least now I’ve got a few crude tools to help me survive there.

Thu 20 Aug 2015 at 01:55PM by Andy Pearce in Software tagged with c++ and c++-stl  |  See comments

☑ Validating UK Postcodes

I find myself needing to enter UK addresses on a fairly regular basis and it never fails to amaze me how poor some of the syntax checking is - basic validation of a UK postcode is really not even remotely difficult.

In these days of online shopping, online account access and, in short, online more or less everything, it’s common to have to enter your address into a web page. Or perhaps someone else’s address. This isn’t usually too taxing, especially with browsers providing auto-complete these days, but one quirk of many of these pages that I find disproportionately irritating is the appalling and inconsistent rules they apply to postcodes.

Most countries have postal codes of some sort — America has zip codes, Ireland has Eircode, Germany has its Postleitzahlen and so on. Most of these systems are fairly simple and the UK is no exception. It’s therefore a continual source of confusion to me how any web developer could possibly manage to mess things up — but they appear to do so, repeatedly, at least for UK postcodes (I can’t speak for the others). Perhaps by explaining things here I might save someone from making the same mistakes — probably not, but at least it gives me the moral highground to rant about it, which is probably the most important thing, right?

A UK postcode is split into two halves, conventionally separated by a space. The first half is the outward code and most commonly consists of 1-2 letters followed by 1-2 digits. The letters specify the area — there are currently 121 of these in Britain1. The digits specify the district within the area, typically with 1 indicating the centre of a city and moving outwards. There’s also an additional complication in London (and perhaps other densely populated areas) that the district becomes too large for use, so an additional letter may be appended to the end of the outward code to further subdivide it.

The part of the postcode after the space is called the inward code and always consists of one digit, specifying a sector within the district which typically narrows it down to around a few thousand addresses, followed by two letters, which specify the unit within the sector which takes it all the way down to the low tens of addresses.

As you might have guessed the names of the two halves reflect the differing uses: the outward code is used to decide which sorting office to send post to for further processing; once it’s arrived there, the inward code is used to determine the specific address to which it should be delivered.

There are some exceptions to these rules, such as for overseas territories or the special code SAN TA1 for Father Christmas. However, unless you’re setting up the Pitcairn Island Delivery Service, you probably don’t need to worry too much about these.

Not that people writing postcode parsing code need to worry about most of this, to be honest — frankly the main take-away from all of the discussion so far is simply this regular expression:

^[A-Z]{1,2}[0-9]{1,2}[A-Z]? [0-9][A-Z]{2}$

That said, I really hope someone about to implement a postcode validator doesn’t stop here beacuse this regular expression demonstrates three of my recurrent annoyances with postcode validators online.

Case-sensitivity
Postcodes are typically presented in upper-case, but only the most petty pedant could possibly claim that sw1a 1aa is incorrect. Don’t be needlessly fussy, allow any combination of case.
Spaces #1
Postcodes typically contain a space, but don’t be a sadist and reject postcodes without one. For bonus marks, strip out anything that isn’t an alphanumeric character before you do any other processing.
Spaces #2
Postcodes typically contain a space, so don’t be an idiot and reject postcodes that contain one or more of them, and don’t be pointlessly fussy if they’re in the wrong place either. For bonus marks, see the previous item.

If you were in Python, therefore, to avoid falling into any of these traps you could easily do the following — it’s quick and dirty and uses regular expressions, but it shows how simple the job is:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import re

NON_ALPHA_RE = re.compile('[^A-Z0-9]+')
POSTCODE_RE = re.compile('^[A-Z]{1,2}[0-9]{1,2}[A-Z]? [0-9][A-Z]{2}$')

def normalise_postcode(postcode):
    """Return a normalised postcode if valid, or None if not."""

    postcode = NON_ALPHA_RE.sub('', postcode.upper())
    postcode = postcode[:-3] + ' ' + postcode[-3:]
    if POSTCODE_RE.match(postcode):
        return postcode
    return None

Now how simple is that? Took me all of ten minutes. Next time you find a website that rejects a postcode that it could have quite easily figured out, or accepts a postcode that’s quite clearly gibberish, just have a little think about how absurdly little time the developer responsible spent on the user experience and perhaps leave some feedback to that effect.

My last annoyance with postcode validators isn’t demonstrable here, but it occurs when people try to do the right thing and validate against the actual postcode database. Unfortunately these people are often rather lackadaisical about keeping it updated so if you live somewhere which was built in the last year or two, you can often kiss goodbye to the thought of ordering anything to your home address. Not a problem in our current house, unless there’s someone somehow running off a postcode database that predates the computer by several decades2 — but having lived at such a place in the past I can attest to it as an almost limitless source of frustration.

Therefore, don’t reject something just because it’s not in the postcode database — and just pay the damn money to keep it up to date, will you? Because no matter how good the thing you’re selling, there’s someone else selling something almost indistinguishable, and they’ll let me order it sucessfully online.

In fact, forget all of the above and just don’t do any postcode validation at all. Let the Royal Mail sort it out, it’s what they get paid for.


  1. The crown dependencies of Guernsey, Jersey and Isle of Man also have two-letter codes that look like postcodes, but I’m not sure if they follow precisely the same scheme. 

  2. And I wouldn’t put it past them! 

Tue 18 Aug 2015 at 06:15PM by Andy Pearce in Software tagged with regex and text-processing  |  See comments

☑ Hardwick & Edinburgh

In late July it was holiday time again - Michelle, Aurora and I headed up to Hardwick Hall near the Peak District and then on to Edinburgh for a few days.

Hardwick & Edinburgh

Sat 15 Aug 2015 at 10:05PM by Andy Pearce in Outings tagged with photos, michelle, aurora and holidays  |  See comments

☑ Time zones and cron

Time zones can be tricky beasts, particularly where daylight savings time is concerned. This post discusses issues around apply them to something like the ubiquitous Unix cron daemon.

I recently had cause to look into how to schedule events to run at times specified in a particular time zone. It turns out to be quite a subtle topic, about which I’ll ramble aimlessly for a few paragraphs below.

Consider writing a scheduler rather like cron. This scheduler has to take account of time zones in its handling - so each event recurrence also has an associated time zone which is used to know when to schedule the events. This is important - a date and time alone (e.g. 2015-07-21 07:09:15) is known as “naïve” and to map that to an actual point in time you also need an associated time zone. This could be UTC, or it could be any of the country-specific offsets from UTC. Once a time zone is added then a datetime becomes “aware”.

This is important for things like cron because a user of a multiuser system may wish to set events in their own time zone, which isn’t necessarily the same as system time. A cron expression is implicitly naïve - there’s no way to turn that into actual events to trigger without adding some sort of time zone. Of course, actual cron implementations themselves probably don’t typically care about this - they’ll just trigger things in system local time and the user can have the hassle of working around it. That’s why I quite carefully said “things like cron” earlier.

Let’s say, then, that you wish to write a better cron - one which allows users to specify time zones with their events. First of all you have to deal with the people who tell you that’s a waste of time - why not just adjust all your cron jobs such that they’re specified in UTC?

Well, that’s a fairly easy argument to deal with - just come up with something where that’s a real pain to do. Let’s say you want to run a cleanup job once every quarter at the end of the month. You know that some months have 30 days and some have 31 so you’re sensible enough to schedule it on the evening of the 30th day of the month: cron won’t run it on months with fewer than 31 days if you put it on the 31st. You might use this cron specification:

00  23  30  3,6,9,12  *

To clarify that expression will run an event at 11:00pm on the 30th day of March, June, September and December. Great, exactly what we want. Except let’s say that I’m in New York, so I’d like this to be specified in EST - but my system time is in UTC. Let’s ignore DST complications for now and assume we always want these events to trigger in UTC-5 (i.e. five hours behind UTC). So let’s adjust that cron expression in EST to be in UTC instead:

00  04  31  3,6,9,12  *

Well that wasn’t so hard - we just added five hours, which caused the day to roll over and now we’re just triggering on the 31st of the following day. But hang on - we already knew that June and September only have 30 days which is why we put it on the 30th to begin with. So now we have a cron job which won’t run on all the specified months.

In this particular simple case it’s not too difficult to see how to split that into two separate cron jobs to support it:

00  04  31  3,12  *
00  04  01  7,10  *

… but in the general case this is not necessarily trivial. For example, try to figure out a cron expression which transforms this expression from New York to UTC without affecting its functionality:

00  23  29  2  *

If we attempt to account for DST then this becomes even less tractable.

We’ve established that the a time-zone-aware cron daemon is the only way to handle this sort of functionality therefore, so how could we implement one?

The simplest solution would be to record the event specifications in particular time zones, but transform them into UTC at the point of actually executing them. Let’s say you’re doing it in Python, you’d have a set of generators for the event specifications and they’d be yielding endless sequences of datetime objects in their specific time zones. You’d convert all of these to some common time zone - say, UTC - and then keep this in a sorted list of events to trigger. All fairly straightforward stuff.

Except that there’s still a wrinkle and that’s around DST. The daylight savings time changes are an intrinsic part of a time zone specification, so if this new cronzone utility is going to support time zones then it’s going to have to deal gracefully with DST. The problem is that it’s not immediately obvious which is the best way this should work.

For the background of anyone unfamiliar with DST, it’s a rule whereby during summer an extra hour is added. Each zone has different dates between which this adjustment is applied, and these dates differ year-to-year - they’re often chosen to be a specific day of the week, such as on a Saturday night.

When the hour jumps forward, the clock simply skips an hour - in the UK, for example, the clock ticks from 00:59:59 straight to 02:00:00. Different zones apply this at different times, but usually in the early hours of the morning.

When the hour jumps backward, the clock repeats an hour. Again in the UK the clock runs as normal until 01:59:59 ticks straight over into 01:00:00. This hour is repeated and then the clock runs on as normal.

All this offers some challenges to cronzone which probably doesn’t want to run commands twice, or skip them entirely. This isn’t a trivial problem, however - the mapping from a time zone which includes DST changes into UTC gets complicated around the points of DST change.

This is most easily illustrated with an example. In 2015 the UK clocks jumped foward in the small hours of March 29 and they’ll go back in the small hours of October 25. Let’s say you scheduled a job to run at 1:30am every day. Well, on the morning of 2015-03-29 in the UK that time simply does not exist - perhaps your job will be skipped. On the morning of 2015-10-25, however, it might run twice - the clocks run past 01:30:00 twice that day.

If your job is doing something important, that could be pretty bad. What if it’s calculating payroll information - perhaps people don’t get paid, or get paid twice. Either way, it’s not ideal.

What is there to do about such a mess? Well, the main thing is to understand the issues and make sure your test cases cover them. Once you’ve done that, it’s a case of finding a decent implementation. If you’re using Python it turns out that the pytz is actually fairly helpful in this regard.

Let’s say you have some code to take a cron-style specification and generate Python datetime objects from it which are the times at which the events should occur. These will implicitly be naïve until we add our per-user time zone information, for which we can use the localize() method in pytz - this attaches an appropriate timezone to a datetime instance.

The clever bit is that this handles DST translations for you, so if you then translate that to another datetime with astimezone() then it’ll fudge the times around so they still represent the wall clock time in the new time zone that corresponds to the same instant in UTC as the old one. Note that you might also need to use the normalize() method when converting directly between time zones as opposed to into UTC - this readjusts the time in case it’s crossed a DST boundary in the process of the conversion.

This isn’t so simple, however - we’ve already seen that converting from a time zone to UTC is not a straight one-to-one mapping around the DST changes - some times in a time zone don’t map to any UTC time (where the hour goes forward) some times map to two UTC times (where the hour goes back).

Since pytz can’t tell exactly what you expect it does what any decent API would do - it offers you a parameter to tell it. In this case several of the methods take the is_dst parameter which is a tri-state value which can be True, False or None. The first thing to note about this parameter is that it only applies during periods of DST change - at all other times the conversion is unambiguous and the value of this parameter has no effect.

The boolean values are straightforward - they just indicate that if there is ambiguity then assume that DST is or is not in place. For example, the UK time 2015-10-25 01:30:00 will occur twice this year, as discussed earlier, so is_dst is used to disambiguate - if it’s True then a conversion to UTC will return 00:30:00, if it’s False it will return 01:30:00.

The third value of None is the one I’m interested in - this has the effect of raising an exception if the time either doesn’t exist (NonExistentTimeError) or if the mapping is ambiguous (AmbiguousTimeError). This allows one to write a scheduler that deals with DST changes in a predictable and safe fashion.

The exact behaviour depends on what you think is most intuitive, but I’d be inclined to respond to the hour jumping forward by immediately running all the jobs that were skipped; and the hour jumping backward by skipping jobs which had already run.

Of course this does have some downsides. Let’s say what you want is to run something every 60 minutes regardless of DST changes, you’ll probably find that around the DST change you get something running twice or with two-hour gap, depending on the direction of the change. It doesn’t seem to me possible to be correct in both cases - for the user who wants things scheduled based on an elapsed time, and also the user who wants things scheduled at specific times of day, not to be missed or run twice over DST changes. Probably the only solution here is to allow them to specify a time zone individually for each job - if they want the former behaviour they can just use UTC.

So that’s it - time zones are tricky beasts. Life would be a lot simpler if DST disappeared overnight, but after all these years there seems little chance of that happening in all 70-odd countries in which it’s used. In the meantime programmers can expect to have to deal with these issues from time to time, where the specific requirements are hard enough to gather without even worrying about the implementation.

Still, it would be significant progress if everyone dealing with times in code would at least educate themselves, consider the issue properly and make an informed decision about how they handle it - as we’ve already seen the “bah, let them eat UTC” approach just doesn’t cut it.

Tue 21 Jul 2015 at 07:09AM by Andy Pearce in Software tagged with python and time  |  See comments

Page 1 of 10   |   Page 2 →   |   Page 10 ⇒