☑ C++11: Other language changes

I've finally started to look into the new features in C++11 and I thought it would be useful to jot down the highlights, for myself or anyone else who's curious. Since there's a lot of ground to cover, I'm going to look at each item in its own post - this one covers a miscellaneous set of language improvements which I haven't yet discussed.

This post contains a collection of smaller changes which I didn’t feel were a good fit into other posts, but which I wanted to cover nonetheless.

nullptr

C++11 has finally added a type-safe equivalent of C’s NULL macro for pointers so one no longer has to use 0 and risk all sorts of confusion where a function has overloads that take an integral type and a pointer. The new constant is nullptr and is implicitly convertible to any pointer type, including pointer-to-members. Its type is nullptr_t. To remain backwards-compatible, the old 0 constant will still work.

Nuff said on that one.

Type-safe enumerations

In C++03 enumerations seem like a wonderfully clear and safe way to specify arbitrary groupings of values. Unfortunately they suffer from a few issues which can quite easily bite you. The main problems stem from the fact that they’re just a teensy tinsy dusting of syntactic sugar over plain old integers, and can be treated like them in most contexts. The compiler won’t implicitly convert between different enum types, but it will convert between them and integers quite happily. Worse still, the members of the enumeration aren’t scoped, they’re exposed directly in the outer scope - the programmer almost invariably ends up doing this scoping with crude name prefixing, which is ugly and prone to inconsistency.

Fortunately C++11 has remedied this lack by adding a new syntax for declaring type-safe enumerations:

enum class MyEnum
{
    First,
    Second,
    Third=103,
    Fourth=104
};

As can be seen, values can be assigned to enumeration members or the compiler can assign them. The identifiers here are scoped within the MyEnum namespace, such as MyEnum::First, so two different enumerations can freely use the same constants without concern. Also, these values can no longer be compared with integers directly, only with other members of the same enumeration.

One of the more minor, but still occasionally annoying, problems with enumerations in C++03 was that the eumeration type was implementation-specific, and could even vary according to the number of items in the enumeration, which could lead to portability problems. As of C++11 the underlying integral type is always specified by the programmer. It defaults to int in declarations such as that shown above, but can be explicitly specified like so:

enum class MyBigEnum : unsigned long { /* ... */ };

There’s also a transitional syntax to allow legacy enumeration declarations to benefit from just this change:

enum MyOldEnum : unsigned long { /* ... */ };

Finally, new-style enumerations can also be forward-declared, something that wasn’t possible in C++031, as long as the underlying type is known (either implicitly or explicitly):

enum MyEnum1;                 // Illegal: legacy syntax, no type
enum MyEnum2 : unsigned long; // Legal in C++11: type explicit
enum class MyEnum3;           // Legal in C++11: type implicit (int)
enum class MyEnum4 : short    // Legal in C++11: type explicit
enum class MyEnum3 : short    // Illegal: can't change type once declared

Of course, as the final example shows it’s not legal to change the type once it’s declared, even if only implicitly.

Array “for” loops

Iterating with a for loop is a common occurrence in C++ code, but the syntax for it is still rooted in its C origins. It’s a flexible construct which has served us well, but there are times when it’s just unpleasantly verbose. As a result, C++11 has added a new version of for which works with a limited set of iterables:

  • C-style arrays
  • Initializer lists
  • Any type with begin() and end() methods (e.g. STL containers)

The new syntax is basically a clone of the same construct in Java:

int myArray[] = {0, 1, 2, 3, 4, 5, 6, 7};
for (int& x : myArray) {
    // ...
}

Note that within the loop x will be a reference to the real values in the array so may be modified. I could have also declared x as a simple int instead of int& and as you might expect this will create a copy of each value as x in the loop so modifications wouldn’t be reflected in the original array.

This is particularly convenient for STL-style containers when combined with type inference:

std::map<std::string, std::string> myMap;
myMap["hello"] = "world";
myMap["foo"] = "bar";

// Old C++03 version
for (std::map<std::string, std::string>::iterator it = myMap.begin();
     it != myMap.end(); ++it) {
    std::cout << it->first << ": " << it->second << std::endl;
}

// New C++11 version
for (auto it : myMap) {
    std::cout << it.first << ": " << it.second << std::endl;
}

Note how with the new syntax the iterator is implicitly dereferenced.

Explicit Conversions

Operator overloading allows classes to work intuitively in similar ways to builtins and one of application of this is for value conversion - for example, overriding operator bool() allows a class instance to be evaluated in a boolean context. Unfortunately C++’s implicit type conversions mean that overriding such operators also brings with it a slew of potentially unwanted other behaviour, which leads to ugly workaround such as the safe bool idiom.

As a cleaner solution, C++11 has extended the possible uses of the explicit keyword to cover such conversion functions. Using this for the bool conversion, for example, allows the class to operator as a boolean but prevent it being further implicitly cast to, say, an integral type.

class Testable
{
  public:
    explicit operator bool() const { /* ... */ }
};

Unicode and Raw String Literals

In C++03 there are two types of string literal:

const char normal[] = "normal char literal";
const wchar_t wide[] = L"wide char literal";

Wide character support is of an unspecified type and encoding, however, sometimes limiting its usefulness. C++11 improves the situation significantly by adding support for the encodings UTF-8, UTF-16 and UTF-32:

const char utf8[] = u8"UTF-8 encoded string";
const char16_t utf16[] = u"UTF-16 encoded string";
const char32_t utf32[] = U"UTF-32 encoded string";

Within these types the escape \uXXXX can be used to specify a 16-bit Unicode code point in hex and \UXXXXXXXX a 32-bit one.

My hope is that wide character support can now quietly expire and be replaced by the standard UTF encodings that everyone should be using. Worst case, I would hope all platform vendors would be working towards wchat_t becoming simply an alias for one of the UTF types.

In addition to unicode strings, C++11 also adds a new syntax for reducing the need for escaping special characters such as quotes within strings:

const char old[] = "Quotes \"within\" strings must be escaped.";
const char new[] = R"xyz(Little "escaping" \ "quoting" required)xyz";

The delimiter (xyz above) can be anything up to 16 characters, and can be chosen so as it doesn’t occur in the string itself. The delimiter can also be empty, making the literal R"(...)".

User-Defined Literals

I’ll outline this only briefly as I haven’t had much cause to play with it myself yet, but C++11 has added the ability to define new types of literal.

Going back to pure C, it’s been possible to clarify the type of a potentially ambiguous literal. For example, 1.23 is a double, but add the f suffix to form 1.23f and the literal is instead of type float. In C++11 the programmer can define new such suffixes to convert raw literals to specific types. These conversions take the form of functions which can accept either the raw form of the literal as a string:

long operator"" _squareint(const char *literal)
{
    long value = strtol(literal, NULL, 10); // Check errors, kids
    return value * value;
}

long foo = 12_squareint; // foo has value 144

Alternatively the code can rely on the compiler to convert the literal to a numeric or string type and use that instead:

// Allow literals in any time unit to be stored as seconds.
unsigned long long operator"" _hours(unsigned long long literal)
{
    return literal * 3600;
}

I must admit I suspect I’ll have limited use for this, but I suppose it’s potentially a nice idea for code which makes heavy use of a particular type - complex numbers spring to mind, for example.

Static Assertions

C/C++ provide the assert() facility for checking invariants at runtime and the #error pragma for compile-time errors in preprocessor macros. However, templates can also benefit from compile-time checks and the new static_assert keyword allows this:

template <class TYPE>
class MyContainer
{
    static_assert(sizeof(TYPE) >= sizeof(int), "TYPE is too small");
}

Alignment

Finally, those targeting many architectures may rejoice that C++11 has added alignof and alignas to query and force the memory address alignment of variables. If you don’t know what alignment is, you probably don’t need to know. Seriously, don’t worry about it - go back and read about lambdas again.


  1. The reason is that the size of the enumeration type couldn’t be known before the full list of members was declared, because implementations were allowed to vary the underlying type based on the number of members.  

Sun 19 Jan 2014 at 11:09PM by Andy Pearce in Software. Tags: C++. comments.

☑ C++11: Template changes

I've finally started to look into the new features in C++11 and I thought it would be useful to jot down the highlights, for myself or anyone else who's curious. Since there's a lot of ground to cover, I'm going to look at each item in its own post - this one covers improvements to template declaration and instantiation.

Extern templates

As fancy as templates seem, they’re really just a type-safe version of pre-processor macros. That is, they do syntactic substitution instead of simple textual substitution. This is part of the reason they’re powerful, but also brings downsides.

One of the main drawbacks is related to compile time and code size, and this is really an interaction between the way templates work and the way C and C++ code is compiled. Because each source file is compiled into an independent object, and the compiler has no way to know in advance how these will be linked together, the compiler has to generate any template code which is used in that source file. Also, it’s possible to force instantiation by simply declaring each instantiated type:

template std::vector<int>;
template std::vector<double>;

This all works for a single object file, but what happens when you link these object files together? Well, if you’re lucky (read: using a halfway recent linker on a well-supported platform) then the code will be automatically collapsed together at link time. If you’re unlucky, you’ll get potentially many copies of each template instantiated - this will work, but leads to unnecessarily large compiled code, with resultant performance implications due to reduced caching potential.

Even if you’re lucky then compile times are still higher because the compiler has to generate those same template instantiations for each compilation unit, even if they’ll later be collapsed together during linking.

So, how does C++11 help here? Well, they’ve added a pretty simple construct which allows you to forward-declare template instantiations:

extern template std::vector<int>;

This is similar to a function prototype - it tells the compiler to assume that a definition of the template instantiation will be provided elsewhere so it doesn’t need to generate one. Of course, if no such instantiation exists then a link error will result.

This means that, for example, code which defines templated classes can use the original C++ syntax above to create instantiations for common types in its source file, and then declare them extern in its header file, so clients of the class need not generate their own. This could provide definite compile time (and potentially code size) improvements for large projects when used with common classes such as containers.

Alias templates

Templating is a great way to make code generically useful whilst maintaining the type safety which is one of the biggest advantages of a statically typed language. However, add a few levels of nesting and things can rapidly become cumbersome - consider the following loop declaration:

for (std::map<std::string, std::list<std::string> >::iterator it = myMap.begin;
     it != myMap.end(); ++it) {
    // ...
}

One way to cut down on this verbosity is to use typedef:

typedef std::map<std::string, std::list<std::string> > MyMapType;
for (MyMapType::iterator it = myMap.begin(); it != myMap.end(); ++it) {
    // ...
}

This works fine for template instantiations (i.e. with all template parameters specified) but it’s not legal to do this for templates which have not been instantiated in C++03. In C++11 this has now been allowed via a new use of the using keyword:

template <typename ListType>
using MapStringToListType = std::map<std::string, std::list<ListType>>;

In the example above the type MapStringToListType<std::list<int>> can be used as an alias for std::map<std::string, std::list<int>>.

This corrects an omission which may have perhaps been only occasionally annoying, but in those cases it was particularly annoying. There are a few limitations, the major ones being that you can’t partially or fully specialise an alias template; and alias templates are never used as candidates for automatic template argument deduction.

Variadic templates

Variadic functions have existed since the early days of the C programming language. Probably the most well known example is the printf() function with the following prototype:

int printf(const char *fmt, ...);

The ellipsis in there denotes an arbitrary number of arguments follow and the compiler and standard library together provide runtime support for determining the number of arguments passed and iterating through them.

In C++11 this feature has been brought to templates:

template<typename... TYPES>
class Aggregate;

Instantiations of such templates can take any number of template arguments - for example:

Aggregate<int, float, std::map<std::string, int>> aggregateInstance;

The language doesn’t provide a mechanism to iterate over the template parameters, however, so often they’re used recursively. The example below shows how the ellipsis can be used to pass a variable number of template arguments into a function:

template<typename TYPE>
void print_stuff(TYPE arg)
{
  std::cout << "Arg: " << arg << "\nEND" << std::endl;
}


template<typename FIRST, typename... REST>
void print_stuff(FIRST first_arg, REST... rest_args)
{
  std::cout << "Arg: " << first_arg << "\n";
  print_stuff(rest_args...);
}

I suspect the legitimate use-cases for these are fairly limited, but at least there’s now a certain amount of type-safety involved, which is more than can be said for traditional varargs functions1.

Right angle bracket

Finally, a teensy little change to the parse rules to address a long-standing annoyance of mine. In C++03 nested template instantiations required an annoying extra space to prevent the parser identifying the double-angle bracket as a bitshift operator:

std::vector<std::vector<int>> myVector;     // Incorrect in C++03
std::vector<std::vector<int> > myVector;    // Corrected version

In C++11 the parsing rules have been modified so that both of the above examples would be valid and equivalent. If, for some reason, the old behaviour is required then it can be forced with appropriate bracketing (but that seems unlikely!).


  1. With the debatable exception of printf() which is common enough that some compilers (e.g. gcc) allow a varags function to be flagged as in the same style as printf() and do type checking against the format string. This is the only such case of type-safety in traditional varargs of which I’m aware, however. 

Tue 26 Nov 2013 at 07:12AM by Andy Pearce in Software. Tags: C++. comments.

☑ C++11: Function and method changes

I've finally started to look into the new features in C++11 and I thought it would be useful to jot down the highlights, for myself or anyone else who's curious. Since there's a lot of ground to cover, I'm going to look at each item in its own post - this one covers various changes to function and method declaration and definition.

Defaulted and deleted methods

C++11 contains various changes which perhaps don’t expand what’s possible, but certainly improve the clarity with which they can be expressed. Among these changes are explicit defaulting and deleting of methods.

C++ has long had the ability for the compiler to automatically generate some special methods on demand - specifically, it will generate:

  • Default constructor: a constructor which takes no arguments.
  • Copy constructor: a constructor which takes only a const reference to its own type.
  • Copy assignment operator: assignment from a const reference to its own type.
  • Destructor: performs various low-level cleanup functions.

Incidentally, C++11 adds two more to this list:

  • Move constructor: as the copy constructor but using move semantics
  • Move assignment operator: as the copy assignment operator but using move semantics.

However, there are a number of limitations to this in C++03, where the compiler either generates a method that you don’t want or fails to generate a method that you do. For example, you may wish the compiler to provide a default constructor even if there are other constructors defined on the class (normally it only generates one if no other constructors are defined). Or perhaps your class semantics are such that copying it will lead to double-freeing some internal resource so you want to disable the copy constructor and assignment.

Some of this has always been possible with little tricks - for example, you could define a private copy constructor and fail to provide an implementation, leading to compile errors if anybody tries to use it. However, in C++11 the default and delete keywords provide a way to do this cleanly and explicitly in a manner consistent with the syntax for declaring a pure virtual method:

class Uncopyable
{
public:
    Uncopyable() = default;
    Uncopyable(const Uncopyable&) = delete;
    Uncopyable& operator=(const Uncopyable&) = delete;
};

This class first declares explicitly that it wants the compiler to generate a default constructor for it. Then it explicitly prevents the compiler from generating a copy constructor and assignment operator, in essence preventing another instance of the class attempting to copy it.

When combined with appropriate templating, the delete keyword can also be used to restrict a function call to only taking one or more explicitly defined types:

class OnlyLongs
{
public:
    void method(long int arg);
    template<class T> void method(T) = delete;
};

I think of this as conceptually extending the notion of an explicit constructor to regular methods, although I’m sure C++ experts might wince at the comparison.

Explicit overrides and final

In a similar vein, it’s now possible to override base class methods more reliably in C++11. Consider the following inheritance:

class Base
{
public:
    virtual void method(double arg);
};

class Derived : public Base
{
public:
    virtual void method(int arg);
};

Perhaps the author of Derived intended to override the virtual method(), but has instead overloaded it with an alternative version taking an int instead of a double. This might seem an obvious mistake, but consider that perhaps Base::method() took an int when Derived was originally defined but has since been modified by someone who was unaware of the existence of Derived. Due to function overloading this is quite valid so the compilation won’t fail.

In C++11 this sort of problem can be avoided by marking a method with the override attribute, specifying that it must explicitly override a method with the same signature in a base class or a compilation error will result:

class Base
{
public:
    virtual void method1(double arg);
    virtual void method2(double arg);
};

class Derived : public Base
{
public:
    virtual void method1(double arg) override; // this is fine
    virtual void method2(int arg) override;    // fails to compile in C++11
};

Similar to override is final which is used in the same way but explicitly prevents a derived class from overriding a method. It can also be used on a base class to prevent any other classes deriving from it further:

class Base1 final { };
class Derived1 : public Base1 { }; // fails to compile, Base1 is final

class Base2
{
public:
    virtual void method(int arg) final;
};

class Derived2 : public Base2
{
public:
    virtual void method(int arg); // fails to compile, method() is final
};

I haven’t quite made up my mind what I think of this particular feature yet - it seems like it could be useful in some limited cases, but I worry a little that base class authors may overuse it and make future improvements difficult. On the other hand, you already have to make things virtual explicitly in C++ and I suppose that final isn’t much higher on the risk spectrum than that.

Constructor chaining and in-place initialisation

One aspect of C++03 that’s a bit of a pain is the fact that you can’t call one constructor from another. So, if you want any commonality of code between constructors then you need to move it into another method and call it from the relevant places:

class Person
{
public:
    Person(const char* name)
    {
        init(name);
    }

    Person(const std::string& name)
    {
        init(name.c_str());
    }

    void init(const char *name)
    {
        // Initialise from name here
    }
};

This works fine but it’s a little annoying having that extra method. Derived classes also must re-implement their own constructors even if a base class constructor would have done the job just as well, which is also a pain.

C++11 solves these issues by allowing constructors to delegate to other constructors (i.e. call them), where the syntax is the same as that used to invoke a base class constructor:

class Person
{
public:
    Person(const char* name)
    {
        // Initialise from name here
    }

    Person(const std::string& name) : Person(name.c_str()) { }
};

One slightly subtle point to note here is for default arguments in library code. Consider the following constructor which we’ll assume is somewhere in a library:

class FileReader
{
public:
    FileReader(size_t bufferSize=4096);
    // ...
};

This is fine as far as it goes, but remember that the constant 4096 is in the header file and hence is incorporated into all code which links against this library - this means that changing the constant requires recompilation of all the code which uses the library, even if the library itself is built as a shared object. However, using delegation we can instead arrange that the constant is fixed in the library code without compromising the interface offered to clients of the library:

// In header file:
class FileReader
{
public:
    FileReader(size_t bufferSize);
    FileReader();
    // ...
};

// In cpp file:
FileReader::FileReader() : FileReader(4096)
{
}

This is perhaps not a major concern outside those building shared libraries (or people with very large projects worried about recompilation time) but it’s worth bearing in mind.

In C++11 it’s also possible to expose a base class’ constructors directly in the derived class with a slightly quirky new use of the using keyword:

class Base
{
public:
    Base(int value);
};

class Derived : public Base
{
public:
    using Base::Base;
};

Finally with regard to construction, it’s now possible to provide non-const members with initial values like this:

class Point
{
public:
    IntList() { }
    IntList(int x, int y) : xPos(x), xPos(y) { }

private:
    int x = 10;
    int y = 20;
};

These initial values apply to any constructors which don’t explicitly assign a different value in their initialiser lists. So in the example above, the default constructor leaves x=10 and y=20, whereas the explicit one leaves the values as whatever was passed in by the caller.

Trailing return type

On the vaguely related subject of method declarations in general, one thing I forgot to mention in my previous post on type inference was that as of C++11, method calls can employ an alternative syntax where the return type is specified after the argument list.

To illustrate why this is useful, consider the following template function:

template <class Left, class Right>
Left divide(const &Left lhs, const &Right rhs)
{
    return lhs / rhs;
}

This is all very generic apart from one problem - the programmer has to specify in advance the return type which constrains the interface for those using it. For example, the original coder has specified here that the return type must be Left, but what about an instantiation where Left is int and Right is float, and the return type is expected to be float?

Since the compiler already must know the result type of the expression at compile-time, C++11 allows type inference to be used. Given my previous post on the topic, what we’d like is something like this:

template <class Left, class Right>
decltype(lhs+rhs) divide(const &Left lhs, const &Right rhs)
{
    return lhs / rhs;
}

This is invalid, however, because the decltype expression refers to parameter names which are not yet defined. To solve this, the new C++11 syntax of trailing return type is used:

template <class Left, class Right>
auto divide(const &Left lhs, const &Right rhs) -> decltype(lhs+rhs)
{
    return lhs / rhs;
}

Note that auto here means something slightly different to its use in type inference since it doesn’t actually do any deduction. The decltype in this case is doing the deduction, but this syntax can just as easily be used for standard function definitions:

auto divide(int x, float y) -> double
{
    return x / y;
}

Personally I’d stick to the standard syntax unless you’re using type inference, however.

Lambdas

Last on the list of ever-increasingly-loosly-related topics in this post is the subject of anonymous functions, or lambdas as they’re often known. These allow functions to be defined inline with other code, taking arguments and returning a value. The basic syntax is:

[capture](parameters)->return { body }

The parameters take the form of a parameter list just as a standard function and you’ll note that the return type is specified using the alternative syntax described above. In the case of a lambda, incidentally, the return type can be omitted and the compiler will infer it, provided that all return expressions evaluate to the same type - similarly the parameter list can be omitted if empty, although I’m not sure if this is good practice. The function body is also that of a standard function. The capture list is related to making closures and I’ll cover that in a second - first let’s see an example.

Consider a container implementing a vector of strings where the user wishes to create a filtered version based on some arbitrary criteria they specify. A flexible way to implement this is to allow the user to pass in a function which is passed each item in turn and have it return a bool to indicate whether the item should be included in the filtered list:

#include <functional>
#include <string>
#include <vector>

class MyStrings
{
public:
    // ...
    std::vector<std::string>
    filteredList(std::function<bool (const std::string&)> func);
};

std::vector<std::string>
MyStrings::filteredList(std::function<bool (const std::string&)> func)
{
    std::vector<std::string> filtered;
    for (auto it = _strings.begin(); it != _strings.end(); ++it) {
        if (func(*it)) {
            filtered.push_back(*it);
        }
    }
}

Note the use of std::function - this is another C++11 addition which stands for any type of function which has the specified prototype. In this case it takes a const std::string& and returns bool. Note that you need to include functional to use std::function.

In principle this approach would work fine in C++03 with an appropriate function pointer or functor, but both defining a named function andy a class with the function call operator overridden are heavier than required here. Also, including the filter function in the context of the calling code makes it easier to follow, rather than having to jump elsewhere in the code to find it.

So, how are things better in C++11? Enter lambdas:

std::vector<std::string> getStringsWithHello(MyStrings &strings)
{
    return strings.filteredList([](const std::string& str) {
        return str.find("hello") != std::string::npos; }
    } );
}

Please excuse the indentation - I’m still fairly new to these things so I haven’t settled on an appropriate coding style. Hopefully the intent of this (not particularly useful) function should be clear, however, and it’s entirely self-contained. Note that we’ve omitted the return type as described above.

To make this function remotely useful in the real world we need to be able to search for any string, not just "hello", and for this we can use the capture list. Effectively this specifies inputs to the lambda from the defining code (as opposed to the code which calls the lambda once it’s defined), but the form this takes is direct access to the local variables in the scope of the defining function. This is usually known as a closure. Let’s see the example above updated to use this:

std::vector<std::string>
getStringsWithSubstring(MyStrings &strings, const std::string& substring)
{
    return strings.filteredList([&substring](const std::string& str) {
        return str.find(substring) != std::string::npos;
    } );
}

In this example the substring parameter has been defined in the capture list of the lambda to make it available to the code in the body. In this case &substring means “capture the substring variable by reference”. It’s worth noting that because it’s captured by reference, it’s a reference to the real value on the stack in the context of getStringsWithSubstring() (which in turn is a reference to the value in the caller as it happens in this example) so any changes the lambda makes will be reflected in the outer scope.

Of course, like any reference to a local variable within a function it will no longer be valid once the function has returned. As a result, if you wish to return a lambda from a function then you’d better capture its locals by value instead of by reference - just omit the &:

std::function<bool (const std::string&)>
getFilterFunction(const std::string& str)
{
    return [str](const std::string& s) {
        return s.find(str) != std::string::npos;
    };
}

For methods it’s also possible to include this in the capture list, which will give the lambda the same implicit access to the members of the class that methods have (i.e. there’s no need to explicitly mention this). Also, a lone & can be used to allow the compiler to automatically determine which variables to capture by reference, and a lone = is equivalent except that values are captured by value. So, some examples of capture lists:

  • [] means capture nothing.
  • [foo,&bar] means capture foo by value and bar by reference.
  • [&] means capture everything by reference.
  • [=,&foo] means capture everything by value except foo, which is captured by reference.
  • [this,foo] means allow the lambda access to class members as well as capturing foo by value.

Bear in mind that both [&] and [=] also include this implicitly. This is very convenient, but it does mean you need to employ the same care accessing variable inside lambdas that use these specifiers inside methods as you would in the methods themselves. Personally I’d suggest keeping your capture list as tightly specified as possible, since I have a great preference for making things explicit in code - that’s very much a personal view, however.

This might all seem a little strange to anybody not familiar with closures in other languages so it might help to not consider lambdas as functions per se, but instead as little class instantiations with the function call operator overloaded. These instances can maintain their own copies of the variables passed into the capture section and use them in the body of the lambda when it’s called. In fact, this is probably more or less how the compiler implements them, but the syntax is a good deal more convenient than attempting to do the same.

Anyway, that was a bit of a long post, but hopefully covered some interesting ground. Well, you’re still reading, aren’t you? What do you mean you just skipped to the end to see if there’d be an amusing picture of a cat?!

Tue 17 Sep 2013 at 07:50PM by Andy Pearce in Software. Tags: C++. comments.

☑ C++11: Type inference and constant expressions

I've finally started to look into the new features in C++11 and I thought it would be useful to jot down the highlights, for myself or anyone else who's curious. Since there's a lot of ground to cover, I'm going to look at each item in its own post - this one covers automatic type inference and generalised constant expressions.

It’s been awhile since my previous C++11 post but I’m still planning to finish the series - eventually! In any case, this post examines new facilities for the compiler to infer types from context, to allow more compact expressions of type-safe code, and also the ability to specify a function or constructor as a compile-time constant.

It’s often a little cumbersome to figure out the type of a value in C++, especially when you’re dealing with a lot of templated and STL code. Even when the types are obvious they can still be rather verbose to specify without adding any particular clarity to the code. C++11 has addressed these issues by adding a new decltype keyword and also taking the venerable old auto keyword from C, dusting it off and giving it a new lease of life.

Even as a C programmer you may not be familiar with the auto keyword, which specifies that a variable has automatic storage. This basically means “a normal variable” since all variables are automatic by default unless explicitly declared otherwise with the register keyword, which I’ve only ever seen used in kernel and device driver code (and rarely even then).

Since it’s essentially entirely unused these days, C++11 has removed the old meaning of auto and instead allowed it to be specifyed instead of a type where the type can be ascertained from the initialiser. For example, the type of both these variables is long int in C++11:

long one;
auto two = 123L;

This may not appear particularly handy, but if you’re dealing with something like an opaque handle or context value returned from some third party library it’s quite nice not to have to be explicit about the type everywhere. Also, how many people enjoy writing this sort of thing:

for (std::map<std::string, std::string>::const_iterator ci = myStringMap.begin();
     ci != myStringMap.end; ++ci) {
    // ...
}

Surely much more pleasant without harming readability one iota:

for (auto ci = myStringMap.begin(); ci != myStringMap.end(); ++ci) {
    // ...
}

Related to the new functionality of auto, the decltype keyword evaluates to the type of the expression within it - this is, of course, something the compiler must know but the programmer may find awkward to discover and often doesn’t care about. So, you could declare a container iterator like this:

const std::vector<int> myVector;
decltype(myVector.begin()) iterator;

This is more useful in combination with auto:

auto context = library::function(arg);
decltype(context) anotherContext;

Be aware, however, that auto and decltype can evaluate to different types for the same argument:

const std::vector<int> myVector(1);
auto v1 = myVector[0];         // has type int
decltype(myVector[0]) v2 = 2;  // has type const int&

Now, from type inference to the not-very-related-but-in-the-same-post-anyway topic of constant expressions. Among C++ programmers, unnecessary use of the C preprocessor is typically frowned upon, for a number of legitimate reasons that are beyond the scope of this post - the executive summary is that it makes it even easier than usual to shoot yourself in the foot.

In C, the preprocessor is typically used for constant values so the compile stage doesn’t need to go to the overhead of allocating storage. In C++, however, the const keyword is typically used instead - this has the advantage of including little niceties like type safety, and still allows the compiler to make all the same optimisations as it otherwise would with the preprocessor.

There are a number of limitations of constant expressions in C++03, however, such as the requirement that they be of essentially integral type (including enumerations). Several of these limitations have been lifted in C++11 with the introduction of the constexpr keyword which specifies that its attached type is a compile-time constant and can be assumed as such by compiler optimisations. So, now non-integral constants are available:

constexpr double pi = 3.141592;
constexpr double twoPi = 2 * pi;

The constexpr keyword can also be used with functions:

constexpr int factorial(int x)
{
    return (x > 0 ? x * factorial(x-1) : 1);
}

void someOtherFunction()
{
    int myArray[factorial(5)];
    // ...
}

In C++03 this would have been invalid as the return value of a function was never a constant expression. In C++11 the constexpr keyword allows the programmer to tell the compiler that the function’s return value will be the same at both compile and runtime. Of course, there are a lot of restrictions on what you can do in such a function - generally it must consist of only a return statement and it can only reference other constexpr functions and values.

Still, despite all the restrictions, hopefully this might discourage people from engaging in template metaprogramming which is, in general, obfuscatory obscurantism of the highest order.

Wed 04 Sep 2013 at 07:38PM by Andy Pearce in Software. Tags: C++. comments.

☑ What’s behind wall number three?

I've had an entertaining weekend drilling holes in the walls and then filling them back up again.

We have an oldish house - specifically, it was built some time in the 1930’s. Since that point it’s been remodeled at least once and redecorated goodness only knows how many times. As a net result, the walls have become interesting.

On Monday I was attempting to do a few odd jobs about the place, including putting some hooks in the walls for curtain tie-backs. Four hooks - two in the lounge, two in the kitchen - what could possibly go wrong?

I started with the lounge. For the first hook I took my hand drill with a very small bit and drilled an exploratory hole. It seemed to go in fairly easily, so I figured there must be some thick plaster, or perhaps I’d got lucky and hit mortar. I switched bits, drilled a larger hole, tapped the rawlplug in and screwed in the hook. Nice and firm. “Brilliant,” I thought to myself, “and there was me thinking this was going to be tricky.”

Hook number two in the lounge was slightly harder to locate - the window goes right up to the wall of the adjoining house1 so I was drilling into the wall between the buildings. Very carefully, I should add, in the unlikely event that our neighbour is reading this. The drilling went well for the first few millimetres but suddenly there was a funny grinding noise and a spray of sand came tumbling out of the hole.

Now this is something I’ve encountered before in our house - some of the walls appear to be constructed of some sort of mixture of sand and optimism, and as soon as the covering layer of paint is removed the hole just crumbles. This is a bit of an issue with rawlplugs because as the hole widens they don’t grip the sides and when you try to screw anything into them, they just spin on the spot. I mean, it’s great if you want to spend the rest of your day merrily spinning the thing around, but less good if you were hoping for it to eventually make its way deeper into the wall in some meaningful way.

I took the approach that I’d taken before in this situation which is to use some fixings designed for plasterboard. These look like massive screws with a very wide thread and they’re designed to chew in and grip firmly, then allowing you to screw something else down a hole in the centre. Think of it as the bizarre offspring of a giant self-tapping screw and a rawlplug. While I wasn’t screwing into plasterboard, I’d still used these to great effect elsewhere in the house where our walls had failed me.

Alas, not this time. So, now I had a much bigger hole in the wall…

Hm, time to slap on some Polyfilla2 and go back to the drawing board. Actually I didn’t spend long at the drawing board - switching to a smaller drill bit and moving to a seemingly more stable part of the wall seemed to do the trick. It’s not the neatest job in the world, but the rawlplug didn’t spin and the wide skirt on the hook covers a multitude of sins.

On to the kitchen now, and I thought these last two were going to be fairly straightforward. I was drilling straight into the rear wall of the house, since they hadn’t come up with cavity walls back in the 30’s3, so I went straight in with my hammer drill and a masonry bit.

The first of the two holes went more or less without incident and by the quantity of brick dust I knew it should be a pretty firm hold. The bit did poke through into what felt like a bit of empty space along the way, but I figured I must have broken through to some mortar. The rawlplug and hook both went in without a hitch.

As I was congratulating myself, however, I did notice that where I’d drilled was about the same level as the power cable to the outside light. Oops, maybe that’s what that void was… As far as I can tell it’s still working, and I haven’t heard any nasty sizzly noises yet, but I’m going to keep an eye on it for a few days. I think I had a lucky escape there.

The final hook came along and the drill went in fine again - until about 1cm in, at which point it stopped making progress and made a funny whining noise. “Probably concrete,” I thought to myself as I merrily cranked up the torque on the drill and tried again, thankful that I got the 24V drill. It did make progress in, eventually, but once I’d decided it was deep enough I noticed that the entry hole had become somewhat widened (in the sand and optimism layer) as I’d failed to keep the drill straight - pushing a rawlplug into that would be a chancy business.

As an aside, I’ll offer a free hint at this point - when you’re judging the depth of a hole you’ve just drilled into concrete, let the drill bit cool for a minute before you touch it… I won’t be doing that again in a hurry.

Anyhow, fortunately I had one final trick to which Dad introduced me in the past - the marvellous Wet’n’Fix. Personally I like to call it by its scientific name, Cheaticus Miraculum. Basically it’s a little pad impregnated with something sticky. You dip it in water, wrap it around your plug and tap the plug in. Three minutes later and voila - your plug is stuck fast and ready to use. One of those and the final plug was ready to screw the hook into and at last the job was done.

It just goes to show how the simplest of DIY jobs can turn into a little adventure all of its own, especially when you’re working in decrepit old houses like ours. Just make sure you always have some Polyfilla available and you can’t go wrong.

Hmm, that’s an idea - next time we redecorate, maybe we’ll make the paint the colour of Polyfilla so I don’t even need to touch up afterwards…


  1. Our house is semi-detached

  2. Or Polyfilla The Great Saviour, as I like to call it. 

  3. I mean, they didn’t even have iPhones for goodness’ sake! 

Wed 28 Aug 2013 at 07:21PM by Andy Pearce in DIY. Tags: drilling, walls. comments.

☑ Process groups and sessions

I finally took a little time to get my head around POSIX process groups and sessions.

Fair warning: if you don’t know what a PID is in the context of a POSIX process - or, indeed, you think “POSIX” sounds like some type of screw head - then you probably don’t need to bother reading this post.

Right, assuming there’s anybody left… For awhile now I’ve had a sort of peripheral awareness of some additional attributes of processes about which I’ve never really bothered too much. The most important attributes, with which most people reading this will likely be familiar, are the PID1, PPID2, UID3 and GID4. There are a few wrinkles like real and effective IDs, but they’re beyond the scope of this post. If you run your favourite process listing command with enough detail (such as ps -eF for example) then you’ll see most of these shown.

However, there are a couple of extra attributes that I’ve never looked at in much detail, which are process group ID (PGID) and session ID (SID). Today I decided that ignorance wasn’t bliss at all, in fact it was a blasted pain, so I’ve looked up what they mean. It turns out that they’re quite simple and, potentially, quite useful. So, here goes.

A process group is more or less exactly what it sounds like - a way to group processes together. This is useful because it’s possible to direct a signal to a process group instead of a specific process. This can be done with the killpg() system call which takes a PGID as a parameter and has the effect of sending the specified signal to every process within that group. You can specify a PGID of 0 to specify the group in which the calling process is found, and actually a standard kill() call with a PID of 0 does the same thing.

The group in which a process is located defaults to the group of the process which created it, but it can be changed with the setpgid() call. Indeed, this is what the shell does when it executes pipelines of commands - each pipeline is put into its own process group, separate from the shell’s group. If any of those commands fork their own children then they’ll also be added to the same group, unless they actively change it. Note that a “pipeline” in this context also applies to the degenerate case of a single command (a pipeline of one!).

Conventionally the PGID of a group is the same as the PID of the first process placed in that group, which is referred to as the process group leader. This is an important concept if you want to change your session, but to explain that I’ll have to explain what a session is.

The session is another level of grouping - i.e. a session contains one or more process groups. Sessions are generally tied to a controlling terminal5. For example, all process groups created by a particular shell will have the same session ID, which will generally be the PID of the shell process - as an aside, this is a quick way to locate all the commands created by a particular shell process. One important aspect of a session is that when moving a process between process groups, both groups must be members of the same session or the operation fails.

A process can be moved to a new session using the setsid() system call. This will create a new process group and place the calling process into it, and then create a new session and place the new process group within that. There are restrictions on which processes may do this, however - see below. Note that the new session will have no controlling terminal, so this system call offers a helpful way for processes to detach from their controlling terminal when they daemonise.

Each session has a foreground process group, which is effectively the currently executing command. This is the group to which a signal will be sent if generated by the terminal (e.g. SIGINT in response to CTRL-C or SIGTSTOP in response to CTRL-Z). Also, only processes within the foreground group can read from the terminal.

Just as a process group has a leader so does a session have a session leader process, which is often a process group leader as well. Both process group and session leaders have various restrictions on them: session leaders can’t be moved between process groups and process group leaders can’t be moved to a new session with setsid(). The session leader is also the process to receive a SIGHUP if the controlling terminal for the session is closed6.

Given all this, we can see how it fits into the “standard” process for daemonising:

  1. fork() and terminate the parent - this ensures the new process is an orphan (adopted by init) and also returns control to the calling shell.
  2. setsid() to create a new process group and session - we can only do this after the fork() above because otherwise we’d be a process group leader. This has detached us from the controlling terminal, which is exactly what daemons should do.
  3. fork() a second time - I believe this is simply so we’re not longer a session leader and can never re-acquire a controlling terminal. There may be additional, more subtle, reasons of which I’m unaware.
  4. chdir("/") or some other directory on which the daemon relies - this is to avoid the daemon keeping a directory active which would prevent it being unmounted. If there’s some directory the daemon requires then it actually may be preferable for it to stay active to prevent accidental unmounting.
  5. umask(0) just to clear any permissions mask we may have inherited.
  6. close() standard file descriptors 0, 1 and 2, which are standard input, output and error respectively. Since we’re detached from our terminal it’s not clear where they’ve been directed to anyway. Note that some daemons determine the highest possible file descriptor using sysconf() with _SC_OPEN_MAX and call close() on them all (ignoring errors) just in case the parent had any other open files - this may be overkill if you’re confident in the behaviour of your calling process, but if you’re at all uncertain it’s the safest course, to avoid wasting file descriptors (of which there’s a finite number available).
  7. open() three times for each of the file descriptors, redirecting them to somewhere sensible. This could be /dev/null or /dev/console, or perhaps a log file you’ve already opened. Some code assumes file descriptors will be allocated sequentially so they just assume that the next three open() calls will get descriptors 0-2, but to be doubly sure you can use dup2() - in that case, however, you should have opened the replacement descriptor before the previous step, otherwise you could have a clash.

A detailed description of all these steps is outside the scope of this post, but I wanted to reproduce the full procedure here for context - you can find more details all over the web.

Let’s see some illustrations of process groups and sessions. Note that the ps invocations I used below are quite Linux-specific, but you should be able to tailor them to your particular Unix variant with a bit of squinting at the man page.

First, we run a simple ps to show the relevant IDs:

$ ps -Ho pid,ppid,pgid,tpgid,sess,args
  PID  PPID  PGID TPGID  SESS COMMAND
 1684  3057  1684 59829  1684 /bin/bash
59829  1684 59829 59829  1684   ps -Ho pid,ppid,pgid,tpgid,sess,args

Here we can see the bash shell has PID 1684 and this matches the SID of both itself and the ps command which was executing. The PPID of the ps matches the PID of bash as one would expect and the ps process has been assigned a new PGID which matches its own PID, so it is the process group leader. The TPGID field indicates the foreground process group within the session, in this case the PGID of ps since that’s the currently executing command in the session.

Second, we’ll add an additional pipeline of commands into the mix:

$ cat | sed 's/hello/goodbye/' &
[1] 17391

[1]+  Stopped                 cat | sed 's/hello/goodbye/'
$ ps -Ho pid,ppid,pgid,tpgid,sess,args
  PID  PPID  PGID TPGID  SESS COMMAND
 1684  3057  1684 17401  1684 /bin/bash
17390  1684 17390 17401  1684   cat
17391  1684 17390 17401  1684   sed s/hello/goodbye/
17401  1684 17401 17401  1684   ps -Ho pid,ppid,pgid,tpgid,sess,args

Note: you can ignore the “stopped” message, this is a result of cat trying to read from its standard input and failing because it’s in the background. Only the foreground process group can read from the terminal, a process in any other group which tries will be sent SIGTSTP and hence be suspended.

So, we can see that both cat and sed have been placed into the same PGID by the shell here, which is different to the PGID of ps. The TPGID of all the entries is still the same as the PGID of ps because ps is again the currently executing command for all groups within the session. Since I’ve used the same shell process as in the previous example, the SID is the same.

Now we can see an example of signals being set to the foreground process group (and not just a single process) by executing the following Python script7:

import signal
import os
import time

# Initialise do_exit to False, On CTRL-C (SIGINT), set it to True.
do_exit = False
def handle_signal(signum, stack):
    global do_exit
    do_exit = True

# Install signal handler.
signal.signal(signal.SIGINT, handle_signal)

# Fork into two processes to illustrate both receiving a signal.
child_pid = os.fork()
if child_pid == 0:
    print "Child is waiting..."
else:
    print "Parent is waiting..."

# Loop until the SIGINT handler sets do_exit to True.
while not do_exit:
    time.sleep(0.1)

# Print appropriate message and exit.
if child_pid == 0:
    print "Child has caught signal."
else:
    print "Parent has caught signal."

Execute this script and then, once parent and child are waiting, hit CTRL-C. You should see the following output, potentially with parent and child messages swapped over in either or both cases:

$ python signal-catcher.py
Child is waiting...
Parent is waiting...
Parent has caught signal.
Child has caught signal.

This clearly shows both processes receiving the SIGINT as a result of CTRL-C. For comparison, if we only send the signal to the child process:

$ python signal-catcher.py &
[1] 33635
Child is waiting...
Parent is waiting...
$ ps -Ho pid,ppid,pgid,tpgid,sess,args
  PID  PPID  PGID TPGID  SESS COMMAND
 1684  3057  1684 33680  1684 /bin/bash
33635  1684 33635 33680  1684   python signal-catcher.py
33640 33635 33635 33680  1684     python signal-catcher.py
33680  1684 33680 33680  1684   ps -Ho pid,ppid,pgid,tpgid,sess,args
$ kill -INT 33640
Child has caught signal.
$ ps -Ho pid,ppid,pgid,tpgid,sess,args
  PID  PPID  PGID TPGID  SESS COMMAND
 1684  3057  1684 33744  1684 /bin/bash
33635  1684 33635 33744  1684   python signal-catcher.py
33640 33635 33635 33744  1684     [python] <defunct>
33744  1684 33744 33744  1684   ps -Ho pid,ppid,pgid,tpgid,sess,args
$ kill -INT 33635
Parent has caught signal.
[1]+  Done                    python signal-catcher.py

Since the command was executed in the background the output gets interleaved with the shell prompt, so I’ve tidied that up for clarity in the output above. The pertinent details are shown unchanged, however - in particular, you can see the child process (only) receives the signal and terminates, remaining only as a defunct zombie process until its parent reaps its return code with something like wait(). Since our little Python script never reaps this return code, the child process’ descriptor will linger as long as the parent remains alive.

We can see that the PGID of the child python process is the same as the parent, as expected. This example also shows clearly the difference between signalling the process group, as in the first example, and signalling a single process, as shown here.

Finally, for completeness, let’s see the same example but signalling the parent process first and then the child:

$ python signal-catcher.py &
[1] 49149
Parent is waiting...
Child is waiting...
$ ps -Ho pid,ppid,pgid,tpgid,sess,args
  PID  PPID  PGID TPGID  SESS COMMAND
 1684  3057  1684 50394  1684 /bin/bash
49149  1684 49149 50394  1684   python signal-catcher.py
49154 49149 49149 50394  1684     python signal-catcher.py
50394  1684 50394 50394  1684   ps -Ho pid,ppid,pgid,tpgid,sess,args
$ kill -INT 49149
Parent has caught signal.
[1]+  Done                    python signal-catcher.py
$ ps -Ho pid,ppid,pgid,tpgid,sess,args
  PID  PPID  PGID TPGID  SESS COMMAND
 1684  3057  1684 51192  1684 /bin/bash
51192  1684 51192 51192  1684   ps -Ho pid,ppid,pgid,tpgid,sess,args
49154     1 49149 51192  1684 python signal-catcher.py
$ kill -INT 49154
Child has caught signal.

This example shows broadly the same principles, but there are a couple of interesting points to note. Firstly, once the parent is dead the shell indicates that the job is “done” - it doesn’t monitor the children of commands that it executes, just when the command itself is completed.

Secondly, after the parent has terminated note how the PPID of the child is set to 1. This is because orphaned processes are automatically adopted by the init process (the root of all processes on the system). If this didn’t happen then they would always remain around as defunct zombies after terminating since there’s no parent process to reap their return code. The init process is implemented such that it calls wait() on all of its children to reap their return codes. Note how even though it’s been adopted, it still shares the same session and is still attached to the same terminal, so ps still displays it without need for the -e (or -A) option.

Hopefully that’s cleared things up for someone. Well, it’s definitely cleared things up for me - I should try explaining things to myself more often.


  1. Process ID, a unique identifier for a process. 

  2. Parent process ID, the PID of the process which created this one. 

  3. User ID, the user as which the process is executing. 

  4. Group ID, the group as which the process is executing. 

  5. Although it’s quite possible for a session to have no controlling terminal - this typically the case with daemon processes, for example. 

  6. In reality, of course, the situation is a little more complicated and there are circumstances that SIGHUP is not set, such as the terminal having the CLOCAL flag set. You can find the gory details in the man pages. 

  7. It’s pretty grotty as far as code quality is concerned, but it’s purely for illustrative purposes. 

Wed 21 Aug 2013 at 09:26PM by Andy Pearce in Software. Tags: linux, processes. comments.

☑ C++11: Initialization

I've finally started to look into the new features in C++11 and I thought it would be useful to jot down the highlights, for myself or anyone else who's curious. Since there's a lot of ground to cover, I'm going to look at each item in its own post - this one covers changes to initialization of variables.

Following on from my previous post on C++11’s new features, today I’m looking at a couple of changes to the way initialization works.

Extended initializer lists

As an extension to C, C++03 supports initialization of arrays and structures by listing values in curly brackets, even allowing such definitions to be nested:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
struct RGB {
  unsigned char red;
  unsigned char green;
  unsigned char blue;
};

struct Pixel {
  int x;
  int y;
  struct RGB colour;
};

Pixel pixelArray[] = {{0, 0, {0xff, 0xff, 0xff}},
                      {1, 0, {0xff, 0x00, 0x00}},
                      /* ... */
                     };

Since there’s no difference between a class and struct in C++ except for the default access specifier being public in a struct and private in a class, this applies equally to both. However, in C++03 it’s only permitted if the type is POD1 - in C++11 it has been extended to cover all class types.

The following example demonstrates the difference:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#include <iostream>

struct PODClass
{
  int member1;
  float member2;
  char member3;
};

class NonPODClass
{
 public:
  NonPODClass(int arg1, float arg2);
  virtual void method();

 private:
  int member1;
  float member2;
  char member3;
};

NonPODClass::NonPODClass(int arg1, float arg2)
    : member1(arg1), member2(arg2), member3('x')
{
  std::cout << "NonPODClass constructed" << std::endl;
}

void NonPODClass::method()
{
  // ...
}

int main()
{
  PODClass podInstance = {1, 2.3, 'a'};     // Valid in C++03 and C++11
  NonPODClass cpp03InitStyle(2, 4.6);       // Valid in C++03 and C++11
  NonPODClass cpp11InitStyle = {3, 6.9};    // Only valid in C++11
  // ...
  return 0;
}

So far it doesn’t seem too much of a stretch, just a fancy way of calling a constructor. It’s also possible to override the constructor called when an initializer list is provided, however, by providing a constructor taking a std::initializer_list<>. For example, consider this class:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#include <cstddef>
#include <initializer_list>
#include <stdexcept>

class MyIntArray
{
 public:
  MyIntArray(std::initializer_list<int> values);
  ~MyIntArray();
  int operator[](size_t index);

 private:
  size_t size;
  int *array;
};

MyIntArray::MyIntArray(std::initializer_list<int> values)
    : size(values.size()), array(NULL)
{
  array = new int[values.size()];
  int *p = array;
  for (std::initializer_list<int>::iterator it = values.begin();
       it != values.end(); ++it) {
    *(p++) = *it;
  }
}

MyIntArray::~MyIntArray()
{
  delete array;
}

int MyIntArray::operator[](size_t index)
{
  if (index >= size) {
    throw std::out_of_range("index invalid");
  }
  return array[index];
}

The following main() function shows how it could be used:

1
2
3
4
5
6
7
8
#include <iostream>

int main()
{
  MyIntArray fib = {1, 1, 2, 3, 5, 8, 13, 21};
  std::cout << "5th element: " << fib[4] << std::endl;
  return 0;
}

All the STL container types have also been updated to support this form of initialization.

It turns out that std::initializer_list<> is just a standard type but you can only construct one using the curly-bracket syntax - after that, however, it can be copied, passed into functions and otherwise manipulated. The list itself is not mutable after creation, however. For example, the main() function above could be modified to read:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
#include <iostream>

int main(int argc, char *argv[])
{
  std::initializer_list<int> init;
  if (argc > 1) {
    init = {1, 1, 2, 3, 5, 8, 13, 21};
  } else {
    init = {1, 1, 2, 6, 24, 120, 720, 5040};
  }
  MyIntArray fib = init;
  std::cout << "5th element: " << fib[4] << std::endl;
  return 0;
}

Uniform initialization

Somewhat related to extended initializer lists are some changes to ensure that all objects may be consistently initialised with the same syntax regardless of their type. One example of where this can fall down in C++03 is the most vexing parse rule and it’s also convenient to have a constructor-like syntax which works with POD types even though they don’t have a defined constructor.

In C++11 the initializer list syntax can be used like a constructor for any type by putting braces in place of brackets, like so:

1
MyIntVector instance{1, 2, 3, 4};

When used with POD types this will work like an initializer list and when used with other class types it will invoke the appropriate constructor. The standard form of constructor invocation with round brackets is still sometimes required if a constructor taking std::initializer_list<> is defined and an alternate constructor needs to be called.

Perhaps these initialization changes aren’t all that sweeping, but it’s nice to see some more consistent behaviour between simple and complex types, and I suspect that these changes may be particularly useful when defining a class which may be templated on both POD and non-POD types.


  1. Plain Old Data: a class with no constructors and no non-public data members. In fact this is a simplification - see this SO answer for more details. Note that this covers C++03, the definition of POD changed a little in C++11 - I’ll cover that in a later post. 

Mon 15 Jul 2013 at 01:38PM by Andy Pearce in Software. Tags: C++. comments.

☑ Passwords: You’re doing it wrong

There are few technical topics about which there's more FUD than picking a strong password.

I’m getting a little sick of how much misinformation there is about passwords.

More or less everyone who’s been online for any length of time knows that picking a “good” password is important. After all, it’s more or less the only thing which stands between a potential attacker and unlimited access to your online services. So, what constitutes “good”?

Well, many, many services will recommend that you make your passwords at least 8 characters in length, avoid words that you’d find in a dictionary and make sure you include mixed case letters, numbers and symbols. Now, if you need a password which is both good and short then this is probably quite reasonable advice. If you picked 8 characters truly at random from all letters, numbers and symbols, that’s something like 4 x 1015 (four million billion) passwords to choose from, and that’s secure enough.

However, these passwords have a bit of a drawback - humans are bloomin’ awful at remembering things like this: 1Xm}4q3=. Hence, people tend to start with something which is a real word - humans are good at remembering words. Then apply various modifications like converting letters to similar digits. This process gives people a warm, fuzzy feeling because their password ends up with all these quirky little characters in it, but in reality something like sH33pd0g is a lot worse than just picking random characters.

Let’s say there are about 10,0001 common words and that each letter can be one of four variants2. That gives around 6 x 108 (six hundred million) possibilities. This might seem like a lot, but if someone can check a thousand passwords every second then it’ll only take them about a week to go through them all.

So, if that’s a bad way to pick memorable passwords, what’s a good way? Well, there are a few techniques which can help if you still need a short password. One of the best I know of is to pick a whole sentence and simply include the first letter of each word. Then mix up the case and swap letters for symbols as already discussed. This at least keeps the password away from being a dictionary word that can be easily guessed, although it’s still a pain to have to remember which letters have been swapped for which numbers and so on.

But wait, do we really need such a short password? If we make it longer, perhaps we don’t need to avoid dictionary words at all? After all, the best password is one that is both secure and memorable, so we don’t have to write it down on a Post-It note3 stuck on our computer. Fortunately the repository of all knowledge xkcd has the answer4!

As it turns out, picking four words more or less at random is much better from a security point of view and, for most people at least, is probably quite a bit easier to remember. Using the same 10,000 word estimate from earlier, picking four words entirely at random gives us 1 x 1016 (ten quadrillion, or ten million billion) possibilities. At a thousand per second this would take over 300,000 years to crack. The beauty is that you don’t need to remember anything fancy like strange numbers of symbols - just four words.

Is it any more memorable? Well, a random series of characters doesn’t give you anything to get to grips with - it’s a pure memory task, and that’s tough for a lot of people. However, if you’ve got something like enraged onlooker squanders dregs or scandalous aardvark replies trooper then you can immediately picture a scene to help you remember it.

So let’s stop giving people daft advice about passwords and certainly let’s remove all those irritating checks on passwords to make sure they’re “secure”, when in reality the net effect of forcing people to use all these numbers and strange symbols more or less the opposite. Most of all, let’s make sure that our online services accept passwords of at least 128 characters so that people can pick properly good passwords, not the ones that everyone’s been browbeaten into believing are good.

As an aside, even with this scheme it’s still really important to pick words at random and that’s something humans don’t do very well. Inspired by the xkcd comic I linked earlier, this site was set up to generate random passwords. Alternatively, if you’re on a Linux system you could use a script something like this to pick one for you5:

1
2
3
4
5
6
7
8
9
#!/usr/bin/python
import random
import string

with open("/usr/share/dict/words", "r") as fd:
    words = set("".join(c for c in line.strip().lower()
                        if c in string.ascii_lowercase) for line in fd)
print("Choosing 10 phrases from dict of %d words\n" % (len(words),))
print("\n".join(" ".join(random.sample(words, 4)) for i in range(10)))

One final point. You might be thinking it’s going to be a lot slower to type four whole words than eight single characters, but actually it’s often almost as fast once you don’t need to worry about fiddling around with the SHIFT key and all those odd little symbols you never normally use, like the helium-filled snake (~)6 and the little gun (¬)7.

Especially on a smartphone keyboard. Let’s face it, if you’ve just upgraded your phone and are seeking help with it in some online forum, there isn’t a way to ask “how do I get the little floating snake?” without looking like a bit of an idiot - clearly this is the most important advantage of all.


  1. While it’s true that the OED has something like a quarter of a million words, the average vocabulary is typically significantly smaller. 

  2. So o could also be O, 0 or (), say. 

  3. Although the dangers of writing passwords down is generally heavily overestimated by many people. I’m not saying it’s a good idea, but having a really good password on a scrap of paper in your wallet, say, is still a lot better for most average users than a really poor password that you can remember. 

  4. Note that Randall’s figures differ from mine somewhat and are probably rather more accurate - I was just plucking some figures in the right order of magnitude out of the air to illustrate the issues involved. 

  5. It’s written to run on a Linux system but about the only thing it needs that’s platform-specific is the filename of a word list. 

  6. Yes, yes, I know it’s called a tilde really. 

  7. OK, fine, it’s a logical negation symbol. Still looks like a little gun to me; or maybe hockey stick; or an allen key; or the edge of a table; a table filled with cakes… Yes, I like that one best. Hm, now I’m hungry. 

Thu 11 Jul 2013 at 03:55PM by Andy Pearce in Software. Tags: security, python. comments.

Page 1 / 4 »