☑ C++11: Other language changes

19 Jan 2014 at 11:09PM in Software
 |  | 

I’ve finally started to look into the new features in C++11 and I thought it would be useful to jot down the highlights, for myself or anyone else who’s curious. Since there’s a lot of ground to cover, I’m going to look at each item in its own post — this one covers a miscellaneous set of language improvements which I haven’t yet discussed.

This is the 6th of the 8 articles that currently make up the “C++11 Features” series.

child map

This post contains a collection of smaller changes which I didn’t feel were a good fit into other posts, but which I wanted to cover nonetheless.

nullptr

C++11 has finally added a type-safe equivalent of C’s NULL macro for pointers so one no longer has to use 0 and risk all sorts of confusion where a function has overloads that take an integral type and a pointer. The new constant is nullptr and is implicitly convertible to any pointer type, including pointer-to-members. Its type is nullptr_t. To remain backwards-compatible, the old 0 constant will still work.

Nuff said on that one.

Type-safe enumerations

In C++03 enumerations seem like a wonderfully clear and safe way to specify arbitrary groupings of values. Unfortunately they suffer from a few issues which can quite easily bite you. The main problems stem from the fact that they’re just a teensy tinsy dusting of syntactic sugar over plain old integers, and can be treated like them in most contexts. The compiler won’t implicitly convert between different enum types, but it will convert between them and integers quite happily. Worse still, the members of the enumeration aren’t scoped, they’re exposed directly in the outer scope — the programmer almost invariably ends up doing this scoping with crude name prefixing, which is ugly and prone to inconsistency.

Fortunately C++11 has remedied this lack by adding a new syntax for declaring type-safe enumerations:

enum class MyEnum
{
    First,
    Second,
    Third=103,
    Fourth=104
};

As can be seen, values can be assigned to enumeration members or the compiler can assign them. The identifiers here are scoped within the MyEnum namespace, such as MyEnum::First, so two different enumerations can freely use the same constants without concern. Also, these values can no longer be compared with integers directly, only with other members of the same enumeration.

One of the more minor, but still occasionally annoying, problems with enumerations in C++03 was that the eumeration type was implementation-specific, and could even vary according to the number of items in the enumeration, which could lead to portability problems. As of C++11 the underlying integral type is always specified by the programmer. It defaults to int in declarations such as that shown above, but can be explicitly specified like so:

enum class MyBigEnum : unsigned long { /* ... */ };

There’s also a transitional syntax to allow legacy enumeration declarations to benefit from just this change:

enum MyOldEnum : unsigned long { /* ... */ };

Finally, new-style enumerations can also be forward-declared, something that wasn’t possible in C++031, as long as the underlying type is known (either implicitly or explicitly):

enum MyEnum1;                 // Illegal: legacy syntax, no type
enum MyEnum2 : unsigned long; // Legal in C++11: type explicit
enum class MyEnum3;           // Legal in C++11: type implicit (int)
enum class MyEnum4 : short    // Legal in C++11: type explicit
enum class MyEnum3 : short    // Illegal: can't change type once declared

Of course, as the final example shows it’s not legal to change the type once it’s declared, even if only implicitly.

Array “for” loops

Iterating with a for loop is a common occurrence in C++ code, but the syntax for it is still rooted in its C origins. It’s a flexible construct which has served us well, but there are times when it’s just unpleasantly verbose. As a result, C++11 has added a new version of for which works with a limited set of iterables:

  • C-style arrays
  • Initializer lists
  • Any type with begin() and end() methods (e.g. STL containers)

The new syntax is basically a clone of the same construct in Java:

int myArray[] = {0, 1, 2, 3, 4, 5, 6, 7};
for (int& x : myArray) {
    // ...
}

Note that within the loop x will be a reference to the real values in the array so may be modified. I could have also declared x as a simple int instead of int& and as you might expect this will create a copy of each value as x in the loop so modifications wouldn’t be reflected in the original array.

This is particularly convenient for STL-style containers when combined with type inference:

std::map<std::string, std::string> myMap;
myMap["hello"] = "world";
myMap["foo"] = "bar";

// Old C++03 version
for (std::map<std::string, std::string>::iterator it = myMap.begin();
     it != myMap.end(); ++it) {
    std::cout << it->first << ": " << it->second << std::endl;
}

// New C++11 version
for (auto it : myMap) {
    std::cout << it.first << ": " << it.second << std::endl;
}

Note how with the new syntax the iterator is implicitly dereferenced.

Explicit Conversions

Operator overloading allows classes to work intuitively in similar ways to builtins and one of application of this is for value conversion — for example, overriding operator bool() allows a class instance to be evaluated in a boolean context. Unfortunately C++’s implicit type conversions mean that overriding such operators also brings with it a slew of potentially unwanted other behaviour, which leads to ugly workaround such as the safe bool idiom.

As a cleaner solution, C++11 has extended the possible uses of the explicit keyword to cover such conversion functions. Using this for the bool conversion, for example, allows the class to operator as a boolean but prevent it being further implicitly cast to, say, an integral type.

class Testable
{
  public:
    explicit operator bool() const { /* ... */ }
};

Unicode and Raw String Literals

In C++03 there are two types of string literal:

const char normal[] = "normal char literal";
const wchar_t wide[] = L"wide char literal";

Wide character support is of an unspecified type and encoding, however, sometimes limiting its usefulness. C++11 improves the situation significantly by adding support for the encodings UTF-8, UTF-16 and UTF-32:

const char utf8[] = u8"UTF-8 encoded string";
const char16_t utf16[] = u"UTF-16 encoded string";
const char32_t utf32[] = U"UTF-32 encoded string";

Within these types the escape \uXXXX can be used to specify a 16-bit Unicode code point in hex and \UXXXXXXXX a 32-bit one.

My hope is that wide character support can now quietly expire and be replaced by the standard UTF encodings that everyone should be using. Worst case, I would hope all platform vendors would be working towards wchat_t becoming simply an alias for one of the UTF types.

In addition to unicode strings, C++11 also adds a new syntax for reducing the need for escaping special characters such as quotes within strings:

const char old[] = "Quotes \"within\" strings must be escaped.";
const char new[] = R"xyz(Little "escaping" \ "quoting" required)xyz";

The delimiter (xyz above) can be anything up to 16 characters, and can be chosen so as it doesn’t occur in the string itself. The delimiter can also be empty, making the literal R"(...)".

User-Defined Literals

I’ll outline this only briefly as I haven’t had much cause to play with it myself yet, but C++11 has added the ability to define new types of literal.

Going back to pure C, it’s been possible to clarify the type of a potentially ambiguous literal. For example, 1.23 is a double, but add the f suffix to form 1.23f and the literal is instead of type float. In C++11 the programmer can define new such suffixes to convert raw literals to specific types. These conversions take the form of functions which can accept either the raw form of the literal as a string:

long operator"" _squareint(const char *literal)
{
    long value = strtol(literal, NULL, 10); // Check errors, kids
    return value * value;
}

long foo = 12_squareint; // foo has value 144

Alternatively the code can rely on the compiler to convert the literal to a numeric or string type and use that instead:

// Allow literals in any time unit to be stored as seconds.
unsigned long long operator"" _hours(unsigned long long literal)
{
    return literal * 3600;
}

I must admit I suspect I’ll have limited use for this, but I suppose it’s potentially a nice idea for code which makes heavy use of a particular type - complex numbers spring to mind, for example.

Static Assertions

C/C++ provide the assert() facility for checking invariants at runtime and the #error pragma for compile-time errors in preprocessor macros. However, templates can also benefit from compile-time checks and the new static_assert keyword allows this:

template <class TYPE>
class MyContainer
{
    static_assert(sizeof(TYPE) >= sizeof(int), "TYPE is too small");
}

Alignment

Finally, those targeting many architectures may rejoice that C++11 has added alignof and alignas to query and force the memory address alignment of variables. If you don’t know what alignment is, you probably don’t need to know. Seriously, don’t worry about it — go back and read about lambdas again.


  1. The reason is that the size of the enumeration type couldn’t be known before the full list of members was declared, because implementations were allowed to vary the underlying type based on the number of members. 

The next article in the “C++11 Features” series is C++11: Library Changes: Threading, Tuples, Hashing and Regexes
Mon 13 Apr, 2015
19 Jan 2014 at 11:09PM in Software
 |  |