Time zones can be tricky beasts, particularly where daylight savings time is concerned. This post discusses issues around apply them to something like the ubiquitous Unix cron daemon.
I recently had cause to look into how to schedule events to run at times specified in a particular time zone. It turns out to be quite a subtle topic, about which I’ll ramble aimlessly for a few paragraphs below.
Consider writing a scheduler rather like cron
. This
scheduler has to take account of time zones in its handling - so each
event recurrence also has an associated time zone which is used to know
when to schedule the events. This is important - a date and time alone
(e.g. 2015-07-21 07:09:15) is known as “naïve” and to map that to an actual
point in time you also need an associated time zone. This could be UTC, or
it could be any of the country-specific offsets from UTC. Once a time zone
is added then a datetime becomes “aware”.
This is important for things like cron
because a user of a multiuser system
may wish to set events in their own time zone, which isn’t necessarily the
same as system time. A cron
expression is
implicitly naïve - there’s no way to turn that into
actual events to trigger without adding some sort of time zone. Of course,
actual cron
implementations themselves probably don’t typically care about
this - they’ll just trigger things in system local time and the user can have
the hassle of working around it. That’s why I quite carefully said “things
like cron
” earlier.
Let’s say, then, that you wish to write a better cron
- one which allows
users to specify time zones with their events. First of all you have to deal
with the people who tell you that’s a waste of time - why not just adjust
all your cron
jobs such that they’re specified in UTC?
Well, that’s a fairly easy argument to deal with - just come up with something
where that’s a real pain to do. Let’s say you want to run a cleanup job once
every quarter at the end of the month. You know that some months have 30 days
and some have 31 so you’re sensible enough to schedule it on the evening of the
30th day of the month: cron
won’t run it on months with fewer than 31
days if you put it on the 31st. You might use this cron specification:
00 23 30 3,6,9,12 *
To clarify that expression will run an event at 11:00pm on the 30th day of March, June, September and December. Great, exactly what we want. Except let’s say that I’m in New York, so I’d like this to be specified in EST - but my system time is in UTC. Let’s ignore DST complications for now and assume we always want these events to trigger in UTC-5 (i.e. five hours behind UTC). So let’s adjust that cron expression in EST to be in UTC instead:
00 04 31 3,6,9,12 *
Well that wasn’t so hard - we just added five hours, which caused the day to
roll over and now we’re just triggering on the 31st of the following day.
But hang on - we already knew that June and September only have 30 days which
is why we put it on the 30th to begin with. So now we have a cron
job which
won’t run on all the specified months.
In this particular simple case it’s not too difficult to see how to split that
into two separate cron
jobs to support it:
00 04 31 3,12 *
00 04 01 7,10 *
… but in the general case this is not necessarily trivial. For example, try
to figure out a cron
expression which transforms this expression from New
York to UTC without affecting its functionality:
00 23 29 2 *
If we attempt to account for DST then this becomes even less tractable.
We’ve established that the a time-zone-aware cron
daemon is the only way to
handle this sort of functionality therefore, so how could we implement one?
The simplest solution would be to record the event specifications in particular
time zones, but transform them into UTC at the point of actually executing
them. Let’s say you’re doing it in Python, you’d have a set of generators for
the event specifications and they’d be yielding endless sequences of datetime
objects in their specific time zones. You’d convert all of these to some
common time zone - say, UTC - and then keep this in a sorted list of events
to trigger. All fairly straightforward stuff.
Except that there’s still a wrinkle and that’s around DST. The daylight savings
time changes are an intrinsic part of a time zone specification, so if this
new cronzone
utility is going to support time zones then it’s going to have
to deal gracefully with DST. The problem is that it’s not immediately obvious
which is the best way this should work.
For the background of anyone unfamiliar with DST, it’s a rule whereby during summer an extra hour is added. Each zone has different dates between which this adjustment is applied, and these dates differ year-to-year - they’re often chosen to be a specific day of the week, such as on a Saturday night.
When the hour jumps forward, the clock simply skips an hour - in the UK, for example, the clock ticks from 00:59:59 straight to 02:00:00. Different zones apply this at different times, but usually in the early hours of the morning.
When the hour jumps backward, the clock repeats an hour. Again in the UK the clock runs as normal until 01:59:59 ticks straight over into 01:00:00. This hour is repeated and then the clock runs on as normal.
All this offers some challenges to cronzone
which probably doesn’t want to
run commands twice, or skip them entirely. This isn’t a trivial problem,
however - the mapping from a time zone which includes DST changes into UTC
gets complicated around the points of DST change.
This is most easily illustrated with an example. In 2015 the UK clocks jumped foward in the small hours of March 29 and they’ll go back in the small hours of October 25. Let’s say you scheduled a job to run at 1:30am every day. Well, on the morning of 2015-03-29 in the UK that time simply does not exist - perhaps your job will be skipped. On the morning of 2015-10-25, however, it might run twice - the clocks run past 01:30:00 twice that day.
If your job is doing something important, that could be pretty bad. What if it’s calculating payroll information - perhaps people don’t get paid, or get paid twice. Either way, it’s not ideal.
What is there to do about such a mess? Well, the main thing is to understand
the issues and make sure your test cases cover them. Once you’ve done that,
it’s a case of finding a decent implementation. If you’re using Python it
turns out that the pytz
is actually fairly helpful in this regard.
Let’s say you have some code to take a cron
-style specification and
generate Python datetime
objects from it which are the times at which
the events should occur. These will implicitly be naïve until we add our
per-user time zone information, for which we can use the localize()
method in
pytz
- this attaches an appropriate timezone to a datetime
instance.
The clever bit is that this handles DST translations for you, so if you
then translate that to another datetime
with astimezone()
then it’ll
fudge the times around so they still represent the wall clock time in the
new time zone that corresponds to the same instant in UTC as the old one.
Note that you might also need to use the normalize()
method when converting
directly between time zones as opposed to into UTC - this readjusts the time
in case it’s crossed a DST boundary in the process of the conversion.
This isn’t so simple, however - we’ve already seen that converting from a time zone to UTC is not a straight one-to-one mapping around the DST changes - some times in a time zone don’t map to any UTC time (where the hour goes forward) some times map to two UTC times (where the hour goes back).
Since pytz
can’t tell exactly what you expect it does what any decent API
would do - it offers you a parameter to tell it. In this case several of the
methods take the is_dst
parameter which is a tri-state value which can
be True
, False
or None
. The first thing to note about this parameter is
that it only applies during periods of DST change - at all other times the
conversion is unambiguous and the value of this parameter has no effect.
The boolean values are straightforward - they just indicate that if there
is ambiguity then assume that DST is or is not in place. For example, the
UK time 2015-10-25 01:30:00 will occur twice this year, as discussed earlier,
so is_dst
is used to disambiguate - if it’s True
then a conversion to
UTC will return 00:30:00, if it’s False
it will return 01:30:00.
The third value of None
is the one I’m interested in - this has the effect
of raising an exception if the time either doesn’t exist
(NonExistentTimeError
) or if the mapping is ambiguous (AmbiguousTimeError
).
This allows one to write a scheduler that deals with DST changes in a
predictable and safe fashion.
The exact behaviour depends on what you think is most intuitive, but I’d be inclined to respond to the hour jumping forward by immediately running all the jobs that were skipped; and the hour jumping backward by skipping jobs which had already run.
Of course this does have some downsides. Let’s say what you want is to run something every 60 minutes regardless of DST changes, you’ll probably find that around the DST change you get something running twice or with two-hour gap, depending on the direction of the change. It doesn’t seem to me possible to be correct in both cases - for the user who wants things scheduled based on an elapsed time, and also the user who wants things scheduled at specific times of day, not to be missed or run twice over DST changes. Probably the only solution here is to allow them to specify a time zone individually for each job - if they want the former behaviour they can just use UTC.
So that’s it - time zones are tricky beasts. Life would be a lot simpler if DST disappeared overnight, but after all these years there seems little chance of that happening in all 70-odd countries in which it’s used. In the meantime programmers can expect to have to deal with these issues from time to time, where the specific requirements are hard enough to gather without even worrying about the implementation.
Still, it would be significant progress if everyone dealing with times in code would at least educate themselves, consider the issue properly and make an informed decision about how they handle it - as we’ve already seen the “bah, let them eat UTC” approach just doesn’t cut it.