☑ An Unhealthy Environment

Recently I’ve been writing code to spawn child processes that had to deal with the POSIX functions for querying and manipulating environment variables. I’ve only just realised how truly awful this interface is in the context of modern multi-threaded applications, and this post is simply me sharing the pain.

Many Unix, and some Windows, users will be familiar with environment variables. These are key/value strings such as USER=andy or SHELL=/bin/bash, and they form part of the global environment provided to a process by the OS. Windows has a similar concept, although it has a few subtle differences and in this post I’m only discussing the situation in POSIX.

POSIX provides various interfaces to query and set these variables. Probably the most well known of these are setenv() and getenv(), so let’s start with those.

The getenv() function is pretty modest - you pass in the name of an environment variable and it returns you a pointer to the value. Simple enough, but immediately the spidey sense starts tingling. The function returns a char* instead of a const char* for one thing, but “the application shall ensure that it does not modify the string pointed to by the getenv() function”. Well, OK, perhaps they didn’t have const in the days this function was written. They also presumably hadn’t heard of thread-safety or re-entrancy, because anything that returns a static pointer pretty clearly does neither.

The setenv() function is also fairly simple - you pass in a new variable name and value, and a flag indicating whether you’re happy for the assignment to overwrite any previous value. But the man page talks about this function modifying the contents of environ - oh yes, let’s talk about that first…

You’ll notice neither of the functions so far has given a way to iterate through all the current environment variables that are set. It turns out that the only POSIX-supported way to do this is use the global environ variable provided by the library. This is similar to argv that’s passed into main() except that instead of an argc equivalent, the environ array is null-terminated. Things start to smell a little fishy when you realise that environ isn’t actually declared in any header files - the application itself has to include something like this, as taken from the POSIX page on environment variables.

extern char** environ;

OK, so just like argv there’s some OS-supplied storage that contains the environment. It’s not const, but hey ho, neither is argv and we seem to cope fine with just not modifying that directly. Except that the key point here is that setenv() does modify environ - the man page even explicitly states that’s how it works. Unlike argv, therefore, you can’t just treat it as some effectively some read-only constant1 array and quietly ignore the fact that the compiler won’t stop you modifying it.

It gets even more crazy when you realise that, according to the man page for the exec family, it’s quite valid to replace your entire environment by assigning a whole new value to environ. You read that correctly - not updating the pointers within environ, just repointing the whole thing at your own allocated memory.

So then, when setenv() comes along and wants to modify this, how on earth can it do so? It has no idea how big an array you’ve allocated in your own code - it either has to copy the whole lot to somewhere provided by the system, or cross its fingers and hope there’s enough space.

And don’t even get me started on the memory management Pandora’s Box that is putenv()

In summary, therefore, I’ve decided that the only sensible course of action is to use environment variables as little as possible. If you must use them as opposed to command-line arguments, you should parse them away right at the beginning of main() and put them into other storage within your code, never worrying about the environment again. If you’re writing a library… Well, good luck with that - let’s hope your application doesn’t mess around with the environment too badly before you want to query it. Whatever you do, don’t update it!

It’s quite possible to work around all this brokenness, of course, as long as you can make some basic assumptions of sanity about your libraries. But it’s all just such a dirty little mess in the otherwise mostly sensible tidiness that POSIX has imposed on the various APIs that exist.

Surely there’s got to be a more sensible way to control the behaviour of applications and libraries? For example, we could have some sort of system-wide database of key/value pairs - unlike the environment it could be lovely and clean and type-safe, and all properly namespaced too. For performance reasons we could stick it in some almost unparseable binary blob. There’s no way such a system could be abused by applications, right? It would remain clean and usable, I’m sure. Now all we need is a snappy name for it - something that indicates the way that values can be registered with it. Perhaps, The Register? No, people will confuse it with the online tech site. What about The Repository? Hm, confusing with source control. I dunno, I’ll think about it some more.


  1. Yes, I’m aware there are some use-cases for modifying argv too, but I class those as unusual cases, and they also tend to be quite system-specific (assuming you want to resize the strings in the process). 

Wed 25 Mar 2015 at 07:28AM by Andy Pearce in Software tagged with posix, environment-variables and linux  |  See comments

☑ Hash Collision Fun

When dealing with data structures that involve hashing, most commonly hash tables, it’s fairly common knowledge that your choice of hash function is an important performance consideration. What’s perhaps less well known is that it can be an important security consideration too - this article briefly discusses why.

Anyone who’s studied Computer Science at university, or has spent a few years in the software industry, will know about hash tables. Those who’ve looked deeper will know that there are other, equally powerful, uses for the same general technique. Its great strength is its performance in the general case, able to average constant time lookups where many data structures only manage performance. The complication is that to achieve this in practice you need to put a lot of care into the hash function that you choose. However, as I discovered recently, average case performance isn’t the only consideration.

The important characteristic of a hash function is that it assigns inputs to random buckets. This should hold regardless of which inputs are chosen as this helps to ensure the average constant time performance - if multiple items hash to the same bucket, the lookup time rapidly degrades towards linear performance. This can have a very large impact on running time of the application. With a reasonable hash function this occurs very rarely in practice, but if someone knows the hash function in use then it may be possible to deliberately engineer hash collisions - this is fairly obvious when you think about it, but it wasn’t something that occurred to me until recently.

Why would someone want to do that? One reason is part of a DoS attack, where an attacker uses some supposedly limited level of access to impair usage for other users. A great example of this is an attack someone devised on the Btrfs filesystem a few years ago. Essentially this involved constructing a large number of filenames which hash to the same bucket, thus causing the number of items in that bucket to grow artificially large and extending the time taken to perform operations massively. This approach could also be used by one user with access to a shared directory to prevent another user creating a file in the same directory by creating many files which hashed to the same value.

Apparently this isn’t as rare as you might imagine - there’s even a CERT advisory which contains a long list of programming languages whose builtin hashing algorithms are prone to this sort of attack. Many of these langauges have since resolved the issue - Python was one of the more recent of these, only having moved to using the cryptographic quality Siphash function in version 3.41.

Aside from all these language features, however, which will have a lot of pressure to change due to the number of people affected, it’s a little concerning to consider how much software there is out there which has used insecure hashing in the name of performance, leaving itself open to these sorts of cunning attacks.

Just one more reason not to roll your own hashing function - but I have no doubts that people will continue to do so for a long time yet.


  1. Prior to choosing Siphash, the Python team did attempt to address the issue using the old hashing algorithm but peturbed by a pseudorandomly chosen initial value per invocation of Python. Unsurprisingly this turned out to be less than adequate. 

Fri 20 Mar 2015 at 07:36AM by Andy Pearce in Software tagged with data-structures and security  |  See comments

☑ Agile Government

I’m quite a fan of Agile software development, but it seems that the same approach can be used in a wide variety of other industry areas. In this post I’ll briefly describe how I discovered a very Agile-sounding approach to nursing in Holland.

Britain is currently undergoing a swathe of austerity under a Tory-lead government, with significant cuts to all sorts of public services. With a national debt of £1.4tn, rising by around £100bn a year, it’s not a surprise that the government is trying to balance the books regardless of what you may think of their means of doing so.

One of the biggest budget items in the UK is, of course, the NHS; and naturally it gets some of the most scrutiny for possible savings. At the same time it’s extremely sensitive when it comes to cuts in frontline services - nobody wants to feel that their health is being put at risk for the sake of saving a little money. As a result, there’s usually a great deal of rhetoric bouncing around about how to achieve savings without affecting actual care. Typically these take the form of some sort of hand-wavy diatribe produced by throwing phrase like “cut through the red tape”, “strip out middle management”, “get rid of bureaucracy” and “back to basics” into a cup, shaking it all around and throwing it onto a page to see what sticks1, but are conspicuously light on concrete plans to implement such measures, or even evidence that it’s possible.

Well, I read a Guardian article about a pilot scheme in Holland which has achieved some impressive-sounding efficiencies. The Buurtzorg community nursing organisation is now supporting 7,000 nurses with only 30 back office support staff; that’s over two hundred nurses for each non-medical employee. Apparently the quality of care is better, with patients requiring 40% fewer hours of care, and nurses have less than half the absenteeism and a third less turnover than other nursing organisations.

This all sounds rather too much like the holy grail of NHS savings for which successive governments have been searching all these years to be true, so decided to try and find out a few more details. Their US homepage has some interesting tidbits, but what really intrigued me was a report I found on the web page of The King’s Fund, a charity that works to improve policy around health care in the UK.

It’s a fairly brief bullet point summary, but here are the main points that leapt out at me:

  • Independent teams of up to 12 nurses.
  • Teams are autonomous and responsible for the whole process.
  • Assessment and care of all types of client: generalists.
  • Monitor the outcomes instead of effort.
  • Focus on activities instead of processes.

So essentially the group improved productivity and employee satisfaction by splitting the nurses into small, autonomous groups who self-organised into the optimal structure for their particular tasks and focused on whatever task they needed to carry out to achieve their objectives instead of rigidly adhering to some centralised process.

To any software engineers out there this might be starting to sound awfully familiar - specifically it really sounds rather like a Agile methodology to me. I’ve long been attracted by the benefits of Agile in software development, but it’s fascinating to see something that appears extremely similar being so gainfully employed in such a different industry. Perhaps I shouldn’t be too surprised, since Agile practices originally grew up in the automotive industry, but it does make me wonder how many sectors could be significantly improved by judicious use of these ideas.

In an oblique way it also reminds me of a TED talk by Sugata Mitra awhile ago about the deficiencies of the education system. His thesis is that our current approach to education was shaped by the Victorians, who needed people who could be employed as effectively cogs in the massive global machine that was the British Empire. He further suggests that today’s technology has rendered the need for such roles largely obsolete, and instead we should be trying to produce young adults who can build on a sound technological foundation and innovate upon it.

In both cases I suspect attempts at widespread change will face an uphill struggle against those who want to cling to the old ways of doing thing, regardless of evidence supporting the change. Personally I believe this is the real challenge in turning round organisations like the NHS, not any shortage of ideas for improvement. That’s why it’s so good to see real world schemes like Buurtzorg showing such massive improvement - such compelling evidence is going to be critical in pushing through change.

But I fear it’ll be a long, long road yet.


  1. Incidentally, the Daily Mail appear to use a similar approach to writing articles on the matter. 

Thu 19 Feb 2015 at 07:47AM by Andy Pearce in Politics tagged with agile and politics  |  See comments

☑ Netflix Neutrality

Well, after almost a year’s downtime I’ve finally found time to get my blog up and running again and even actually write an entry. Spurred by articles on Netflix’s rampant traffic growth last month, I’ve decided to lay down my thoughts on the related topic of Net Neutrality, which may be somewhat at odds with many in the technology community. This is a fairly old topic these days, but one that I think will be relevant for some time to come yet.

You’ve probably heard of the online video rental service Netflix — in case you hadn’t, they’re a company who started as a flat rate DVD rental service in the US but are now best known for their online streaming offering, making their entire catalogue available for watching online for a flat monthly fee.

Well, Netflix have enjoyed some pretty significant growth. Last month, for example, it was announced that they comprised almost 35% of downstream traffic and even 10% of upstream ACKs — that’s a massive proportion of bandwidth for anyone, not least of which a company whose market cap is only around 5% of Google’s. This growth in traffic speaks to Netflix’s rampant popularity, but this success has also brought them some pretty stern opponents — primarily ISPs1.

ISPs seem rather bitter about the success of companies such as Google and Netflix. This is because the ISPs feel that these companies are only able to make their profits because of the network infrastructure that the ISPs have built out at their own expense. This is a rather debatable position, which I’ll come to in a moment, but whether justified or not in recent years the ISPs have become increasingly desperate to somehow jump on the bandwagon and monetise the rampant success of online services.

Clearly the primary source of revenue for an ISP is subscription fees — could they just charge their customers a higher flat rate? Well, increasing fees is generally unpopular, especially in areas where there’s little competition, and would most likely attract a lot of negative attention from regulators. Another possibility for making additional money is to provide their own services, but in practice these are almost invariably significantly less compelling than existing players and gain very little market traction. It shouldn’t be a surprise, since the existing providers make this their sole business — in short, they know what they’re doing.

Faced with these avenues being frustrated, ISPs have instead made moves to try to leach some of the revenue streams away from service companies (e.g. Google and Netflix). They have a trump card to play to achieve this, which is to put roadblocks between their users and the servers used by these companies such that the service gets degraded or removed completely. End users typically don’t know the reasons for this and assume the service is poor, placing their pressure on the provider of the service to fix it. Indeed, exactly this has happened to Netflix where the ISP Comcast in the US started slowing down their data — the problem got so bad that some users were totally unable to watch movies and cancelled their Netflix subscriptions.

Understandably the likes of Google and Netflix are none too happy with what they view as unfair business practices. These fears were proved somewhat justified in Netflix’s case where they were forced to cave in and start paying ISPs for unthrottled access to their customers. This turn of events is a concern to a lot of the online services companies who feel that it’s the thin end of a wedge that’s going to leave them at the mercy of powerful ISPs. As a resut, for years now they’ve been lobbying governments worldwide, but primarily in the US, to pass laws enforcing what’s typically known as Net Neutrality — put simply, the notion that nobody gets to artificially accelerate, slow down or block traffic from one particular source relative to others.

Under Net Neutrality, Comcast wouldn’t be allowed to slow down Netflix movie streaming, for example, although they would be able to, say, offer a uniformly slow service for a lower cost to consumers (i.e. “fair” throttling). As well as these companies, a lot of grassroots Internet advocates also support Net Neutrality, believing that it will help protect the Internet in its current fairly open form from undue corporate interference which could harm the quality of services now and in the future.

Now the actions of all of these parties are quite understandable from their own points of view, but frankly I believe they’re all rather misguided — for the remainder of this post, I’ll try to explain why.

Firstly, as much as the ISPs tend to be demonised in these debates, it can’t be denied that they have a potential problem. As public companies they have shareholders, and like any shareholders they want to see growth in the company and a return on their investment. If ISPs are blocked from innovating to achieve this growth then they’ll stop attracting investment; and that means they’ll be unable to afford to upgrade their infrastructure; and that means gradually degrading service for everyone — that’s a lose/lose scenario. Hence it’s totally natural to see why they’d oppose major legislative constraints on their business.

Secondly, and conversely, it is certainly a problem that ISPs are in the position where they can so easily inflict untold damage on the bottom line of companies who sell their services online. At its most extreme this coud develop into a form of protection racket, where the ISPs can extract arbitrary sums of money from other companies in return for access — I’m not suggesting that it’s anything like this at present, but even the possible risk is clearly not an attractive state of affairs.

Thirdly, a lot of people seem to be putting a lot of faith into legislation to sort this out — they believe some strong Net Neutrality laws will block the ISPs from “abusing” their networks and interfering with other people’s business. But they apparently forget the atrocious record that governments have in passing laws that intersect with business, and especially technology. Those who make policy just do not understand the issues enough to make informed choices, so it becomes a battle of the lobbyists. Let us not forget that ISPs are typically well established companies with plenty of lobbyists of their own, so it’s not at all clear that some awful tangled up mess of a compromise won’t emerge at the end that doesn’t particularly please anyone except the lawyers who’ll get to wrangle over it in legal disputes for decades.

Finally, even if we could rely on government to pen some good legislation now which is fit for purpose and achieves the stated goals, how can we assess the potential for stifling future innovation? There are quite legitimate uses for traffic prioritisation — for example, a cable company might decide to deliver all of its TV services over its IP infrastructure and customers are clearly going to expect to receive this without interruption even if their neighbour is downloading a hundred movies over BitTorrent. Or perhaps a hospital decides to allow doctors to perform operations remotely via the Internet and requires the ISP to give this important traffic priority. Preventing ISPs from implementing this sort of mechanism risks harming both their service and customers in the future by preventing them from making best use of their infrastructure.

In summary, I accept there are problems to solve in the ISP business, but I don’t accept that Net Neutrality legislation is necessarily the best solution to them. So what is?

What about good old competition?

The root cause of these problems, in my mind, is not that the ISPs are able to make decisions about their own networks which affect other companies — the problem is that their consumers are, in many cases, lacking any viable alternative provider they can move to. This problem appears to be particularly acute in the US, where some areas really have no practical choice at all for high speed Internet access. This means the ISPs have more or less carte blanche to implement whatever draconian traffic management policies they like, and their customers can do very little but complain about it. Were there viable alternatives, customers could move away in droves and effectively force a reverse of the policy. For this to work adequately we could perhaps have some light legislation that forces ISPs to make public their traffic shaping measures, but that’s really just tightening up existing regulations on requiring products and services to be accurately described.

Let us not also forget that ISPs don’t just look to exploit other companies — they’ve shown a willingness to exploit their customers as well, admitting that data caps are more about extracting additional revenue than managing network congestion, which is the reason they typically cite publicly. Competition and free market economics handily resolve these sorts of issues as well, whereas Net Neutrality doesn’t say a lot about ISP charging structures, subscription fees or fairness to customers.

Of course, it’s undeniable that competition in this market is incredibly hard to engineer — a lot of basic infrastructure has to be built for even a basic service, and that’s ignoring ask those difficult issues like getting permission to dig up thousands of roads. There are ways this can be done, however, such as “local loop unbundling” or LLU, where the government instead force the incumbents to open up their “last mile” infrastructure to the competition — that’s the connection to your house, and it’s the expensive part that prevents competing entities starting a business.

This might seem unfair to incumbent ISPs but let’s not forget that, for example, the US cable companies only got their local monopolies with a little help from the government in the first place. It’s also important to note that while ISPs may be jealous of the profits of service companies, they’re still large companies with a healthy bottom line — compare Comcast’s profits of $2.6bn last quarter with Netflix’s rather more modest $60m, so incurring some capex in the short term to unbundle their services isn’t going to send them bankrupt.

The use of LLU has been quite successful in countries like the UK, for example. Indeed, earlier this year the incumbent BT asked the regulator to allow it to charge more for LLU access, which is probably a pretty good sign that competition is working. Also, LLU is just one example of an approach that’s been shown to work — I’m sure there are other forms of government intervention which could encourage competition.

Overall, therefore, I would argue that although fostering competition may be the harder path, ultimately the market will be a lot healthier for everyone, not least of which the consumer, if fair market economics is allowed to assert itself instead of relying on the vagaries of government legislation to swing the balance of power around one way or another.

Whilst we’re on the subject I should also mention that this isn’t regarded as a purely legislative or policy issue by everyone — some people believe that a technical solution is a feasible alternative. The most likely candidate seems to be switching to some sort of P2P system, where users share video streams among themselves as well as from Netflix’s servers. This approach is used to implement the popular music service Spotify, for example. It’s not without its potential issues, but P2P company BitTorrent have said they think this is the best approach. Well, OK, so they’re a P2P company, of course they’d say that, right? But actually not so crazy — looks like Netflix has advertised for someone with the relevant skills already, so it seems as if they’re at least investigating this possibility.

Personally I think that’s a mixed bag. On the one hand, it would undoubtedly make it a lot harder for ISPs to selectively block or throttle Netflix’s traffic; on the other hand, ISPs like Comcast have shown in the past that they’re quite willing to take on the challenges of blocking P2P traffic and if Netflix decided to stop paying them then it’s quite possible they’d be up for the fight of doing so again. I think it’s also debatable that a catalogue as large as Netflix’s would benefit from P2P’s swarm effect — the efficiencies will tend to come when a large number of people are concurrently watching a small amount of content. This might work with popular new releases, for example, but the long tail of Netflix’s content may mean that much video just gets streamed from its servers anyway due to lack of other peers. Finally, there are complex issues surrounding use of storage on customer machines and concerns from rights holders over storing their content for longish periods on computers outside of Netflix’s control. I’m sure all these issues are quite resolvable, but it’s certainly not a simple topic.

In conclusion, then, like many issues in life it’s complicated. Parties on both sides have legitimate concerns; and parties on both sides have, at times, been quite disingenuous with their petitions to the government and the public. I think it boils down to whether we can find a way to allow the tried and tested mechanisms of the free market take control; or whether we’re going to roll the dice and let the a handful of government policy makers come up with some sort of legislation that may help or hinder and whose long-term effects are hard to predict.

I know which I’d prefer, but I think there’s only one thing that we can all be certain of — this debate is likely to rumble on for a very long time yet before it’s decided one way or another.


  1. Internet Service Providers (ISPs) are those companies that provide Internet access to the public. 

Fri 19 Dec 2014 at 07:49AM by Andy Pearce in Software tagged with internet and net-neutrality  |  See comments

☑ C++11: Other language changes

This is part 6 of the “C++11 Features” series which started with C++11: Move Semantics.

I’ve finally started to look into the new features in C++11 and I thought it would be useful to jot down the highlights, for myself or anyone else who’s curious. Since there’s a lot of ground to cover, I’m going to look at each item in its own post — this one covers a miscellaneous set of language improvements which I haven’t yet discussed.

This post contains a collection of smaller changes which I didn’t feel were a good fit into other posts, but which I wanted to cover nonetheless.

nullptr

C++11 has finally added a type-safe equivalent of C’s NULL macro for pointers so one no longer has to use 0 and risk all sorts of confusion where a function has overloads that take an integral type and a pointer. The new constant is nullptr and is implicitly convertible to any pointer type, including pointer-to-members. Its type is nullptr_t. To remain backwards-compatible, the old 0 constant will still work.

Nuff said on that one.

Type-safe enumerations

In C++03 enumerations seem like a wonderfully clear and safe way to specify arbitrary groupings of values. Unfortunately they suffer from a few issues which can quite easily bite you. The main problems stem from the fact that they’re just a teensy tinsy dusting of syntactic sugar over plain old integers, and can be treated like them in most contexts. The compiler won’t implicitly convert between different enum types, but it will convert between them and integers quite happily. Worse still, the members of the enumeration aren’t scoped, they’re exposed directly in the outer scope — the programmer almost invariably ends up doing this scoping with crude name prefixing, which is ugly and prone to inconsistency.

Fortunately C++11 has remedied this lack by adding a new syntax for declaring type-safe enumerations:

enum class MyEnum
{
    First,
    Second,
    Third=103,
    Fourth=104
};

As can be seen, values can be assigned to enumeration members or the compiler can assign them. The identifiers here are scoped within the MyEnum namespace, such as MyEnum::First, so two different enumerations can freely use the same constants without concern. Also, these values can no longer be compared with integers directly, only with other members of the same enumeration.

One of the more minor, but still occasionally annoying, problems with enumerations in C++03 was that the eumeration type was implementation-specific, and could even vary according to the number of items in the enumeration, which could lead to portability problems. As of C++11 the underlying integral type is always specified by the programmer. It defaults to int in declarations such as that shown above, but can be explicitly specified like so:

enum class MyBigEnum : unsigned long { /* ... */ };

There’s also a transitional syntax to allow legacy enumeration declarations to benefit from just this change:

enum MyOldEnum : unsigned long { /* ... */ };

Finally, new-style enumerations can also be forward-declared, something that wasn’t possible in C++031, as long as the underlying type is known (either implicitly or explicitly):

enum MyEnum1;                 // Illegal: legacy syntax, no type
enum MyEnum2 : unsigned long; // Legal in C++11: type explicit
enum class MyEnum3;           // Legal in C++11: type implicit (int)
enum class MyEnum4 : short    // Legal in C++11: type explicit
enum class MyEnum3 : short    // Illegal: can't change type once declared

Of course, as the final example shows it’s not legal to change the type once it’s declared, even if only implicitly.

Array “for” loops

Iterating with a for loop is a common occurrence in C++ code, but the syntax for it is still rooted in its C origins. It’s a flexible construct which has served us well, but there are times when it’s just unpleasantly verbose. As a result, C++11 has added a new version of for which works with a limited set of iterables:

  • C-style arrays
  • Initializer lists
  • Any type with begin() and end() methods (e.g. STL containers)

The new syntax is basically a clone of the same construct in Java:

int myArray[] = {0, 1, 2, 3, 4, 5, 6, 7};
for (int& x : myArray) {
    // ...
}

Note that within the loop x will be a reference to the real values in the array so may be modified. I could have also declared x as a simple int instead of int& and as you might expect this will create a copy of each value as x in the loop so modifications wouldn’t be reflected in the original array.

This is particularly convenient for STL-style containers when combined with type inference:

std::map<std::string, std::string> myMap;
myMap["hello"] = "world";
myMap["foo"] = "bar";

// Old C++03 version
for (std::map<std::string, std::string>::iterator it = myMap.begin();
     it != myMap.end(); ++it) {
    std::cout << it->first << ": " << it->second << std::endl;
}

// New C++11 version
for (auto it : myMap) {
    std::cout << it.first << ": " << it.second << std::endl;
}

Note how with the new syntax the iterator is implicitly dereferenced.

Explicit Conversions

Operator overloading allows classes to work intuitively in similar ways to builtins and one of application of this is for value conversion — for example, overriding operator bool() allows a class instance to be evaluated in a boolean context. Unfortunately C++’s implicit type conversions mean that overriding such operators also brings with it a slew of potentially unwanted other behaviour, which leads to ugly workaround such as the safe bool idiom.

As a cleaner solution, C++11 has extended the possible uses of the explicit keyword to cover such conversion functions. Using this for the bool conversion, for example, allows the class to operator as a boolean but prevent it being further implicitly cast to, say, an integral type.

class Testable
{
  public:
    explicit operator bool() const { /* ... */ }
};

Unicode and Raw String Literals

In C++03 there are two types of string literal:

const char normal[] = "normal char literal";
const wchar_t wide[] = L"wide char literal";

Wide character support is of an unspecified type and encoding, however, sometimes limiting its usefulness. C++11 improves the situation significantly by adding support for the encodings UTF-8, UTF-16 and UTF-32:

const char utf8[] = u8"UTF-8 encoded string";
const char16_t utf16[] = u"UTF-16 encoded string";
const char32_t utf32[] = U"UTF-32 encoded string";

Within these types the escape \uXXXX can be used to specify a 16-bit Unicode code point in hex and \UXXXXXXXX a 32-bit one.

My hope is that wide character support can now quietly expire and be replaced by the standard UTF encodings that everyone should be using. Worst case, I would hope all platform vendors would be working towards wchat_t becoming simply an alias for one of the UTF types.

In addition to unicode strings, C++11 also adds a new syntax for reducing the need for escaping special characters such as quotes within strings:

const char old[] = "Quotes \"within\" strings must be escaped.";
const char new[] = R"xyz(Little "escaping" \ "quoting" required)xyz";

The delimiter (xyz above) can be anything up to 16 characters, and can be chosen so as it doesn’t occur in the string itself. The delimiter can also be empty, making the literal R"(...)".

User-Defined Literals

I’ll outline this only briefly as I haven’t had much cause to play with it myself yet, but C++11 has added the ability to define new types of literal.

Going back to pure C, it’s been possible to clarify the type of a potentially ambiguous literal. For example, 1.23 is a double, but add the f suffix to form 1.23f and the literal is instead of type float. In C++11 the programmer can define new such suffixes to convert raw literals to specific types. These conversions take the form of functions which can accept either the raw form of the literal as a string:

long operator"" _squareint(const char *literal)
{
    long value = strtol(literal, NULL, 10); // Check errors, kids
    return value * value;
}

long foo = 12_squareint; // foo has value 144

Alternatively the code can rely on the compiler to convert the literal to a numeric or string type and use that instead:

// Allow literals in any time unit to be stored as seconds.
unsigned long long operator"" _hours(unsigned long long literal)
{
    return literal * 3600;
}

I must admit I suspect I’ll have limited use for this, but I suppose it’s potentially a nice idea for code which makes heavy use of a particular type - complex numbers spring to mind, for example.

Static Assertions

C/C++ provide the assert() facility for checking invariants at runtime and the #error pragma for compile-time errors in preprocessor macros. However, templates can also benefit from compile-time checks and the new static_assert keyword allows this:

template <class TYPE>
class MyContainer
{
    static_assert(sizeof(TYPE) >= sizeof(int), "TYPE is too small");
}

Alignment

Finally, those targeting many architectures may rejoice that C++11 has added alignof and alignas to query and force the memory address alignment of variables. If you don’t know what alignment is, you probably don’t need to know. Seriously, don’t worry about it — go back and read about lambdas again.


  1. The reason is that the size of the enumeration type couldn’t be known before the full list of members was declared, because implementations were allowed to vary the underlying type based on the number of members.  

Sun 19 Jan 2014 at 11:09PM by Andy Pearce in Software tagged with c++  |  See comments

Page 1 of 7   |   Page 2 →   |   Page 7 ⇒