☑ C++11: Library Changes: Threading, Tuples, Hashing and Regexes

13 Apr 2015 at 7:26AM in Software
 |  | 

I’ve finally started to look into the new features in C++11 and I thought it would be useful to jot down the highlights, for myself or anyone else who’s curious. Since there’s a lot of ground to cover, I’m going to look at each item in its own post — this is the first of the final two that cover what I feel to be the most important changes to the standard template library.

This is the 7th of the 8 articles that currently make up the “C++11 Features” series.

child map

C++11 introduces many changes to the STL as well as the core language itself. This post is the start of a summary of what I feel to be the most important ones, with a second post to follow to cover the remainder.

Utilise core language features

I don’t want to cover this in too much detail as I feel most of the points are fairly self-evident, but as the core library has had several features added then the STL has, unsurprisingly, been updated to take advantage of them. A handful of these changes for flavour:

Move semantics
Library containers now support rvalue references and take full advantage of classes that provide move semantics. Combined with variadic templates, this also allows support for perfect forwarding.
Unicode
Types that support characters have been extended to also support the new char16_t and char32_t types for UTF-16 and UTF-32 respectively.
Explicit conversions
The new ability to mark type conversion methods as explicit allows some common gotchas to be addressed.

Threading

C++11 contains a variety of core language improvements which support multithreaded code, such as a new memory model which allows thread-safe construction of static objects and support for thread-local storage. In tandem with these changes the STL also has some platform-independent support for threads as well.

There’s a new std::thread class which can be passed a function object and a series of arguments to execute that function in a separate thread of execution. Generic waits for completion are possible with std::thread::join() and there’s also access to the native underlying thread ID where possible.

A handful of synchronisation primitives have also been added including std::mutex, std::recursive_mutex, std::condition_variable and std::condition_variable_any. These have all the usual straightforward semantics.

As well as these primitives there are some other very handy additions. The std::lock_guard class is a simple RAII wrapper for any lockable and std::unique_lock is a useful wrapper which allows deferred locking with RAII semantics and also locking with a timeout. There’s also std::lock which allows a set of lockables to be all acquired whilst avoiding deadlock.

For higher performance, and greater convenience, there’s also the templated std::atomic which allows safe, lock-free access to fundamental data types - there are also a set of convenience specialisations (e.g. std::atomic_int, std::atomic_char). It’s even possible to specify the precise ordering constraints to use for a particular operation with these types.

Finally some higher-level support for asynchronous operations has been added with support for futures. A function can be made asynchronous with std::async, std::packaged_task or std::promise, all of which return a std::future which allows the result of the operation to be obtained once available. One annoyance, however, is that there’s no way to compose futures and hence wait on any of a whole set of asynchronous operations to complete.

All in all, then, a not quite perfect but nonetheless comprehensive set of platform independent tools for threading C++. Frankly, it’s about time.

Update: A couple of further notes on threading that may be helpful.

Firstly, when using GCC on POSIX platforms (and possibly others) you should compile with the -pthread option. This enables any platform-specific requirements for pthreads — on my Linux box it seems to add (at least) the following options:

  • -D_REENTRANT to the pre-processor definitions, which changes the behaviour of some glibc code. For example, I think it puts errno into thread-local storage.
  • -lpthreads to the list of linker flags to link in the required library code.

Secondly, remember that the behaviour of an exception propagating to the outermost scope of a thread is unchanged — in other words, if it’s not handled then it will invoke std::terminate() just as if main() had thrown it:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
#include <chrono>
#include <iostream>
#include <thread>

void hello()
{
  std::cout << "hello, world" << std::endl;
  //throw "oops";
}

int main()
{
  std::thread t1(hello);
  std::this_thread::sleep_for(std::chrono::seconds(2));
  t1.join();
  std::cout << "Joined" << std::endl;
  return 0;
}

That code should run fine, but uncomment line 8 and you’ll see what I mean.

OK, that’s probably not too surprising. However, another more subtle cause of std::terminate() is if you destroy a joinable std::thread object without joining it first. For example, change lines 13-15 to this and see whether you get as far as the Joined message:

std::thread* t1 = new std::thread(hello);
std::this_thread::sleep_for(std::chrono::seconds(2));
delete t1;

Tuples

The variadic macro support added in C++11 allowed the addition of quite clean support for tuples — i.e. a heterogenous fixed-dimension list of members, a little like a struct whose members don’t have names. The types of members are declared with template parameters, as you would expect, and then std::get can be used to obtain references to the members which can be used to both query and update them.

std::tuple<int, std::string> myTuple(123, "hello, world");
std::cout << std::get<0>(myTuple) << " :: "
          << std::get<1>(myTuple) << std::endl;
std::get<0>(myTuple) = 456;
std::get<1>(myTuple) = "goodbye, world";

C++11’s newly repurposed auto keyword and type inference can be used to declare tuples even more conveniently:

auto myTuple = std::make_tuple(123, "hello, world");

Hash Tables

One thing that I was really happy to see finally added was a standardisation of hash tables. The venerable std::map has been around for some time now, and serves as a perfectly serviceable associative array — it’s always been a little irritating, however, that it’s implemented as a tree1. This is great if you need to iterate over the elements in sorted order, but if that’s not important then the \( \mathcal{O}(\log{}n) \) access time seems a harsh price to pay when a well-implemented hash table can average constant time access.

There are four additional types which are hash equivalents of the existing associative containers:

I find this particularly useful as associative containers can often grow quite large but if the ratio of lookups to inserts is large then the logarithmic behaviour of std::map is a bit of a turnoff. This is almost certainly going to be become my default associative container, barring the requirement for sorted order iteration, which I find is rare; and the issue of the key type not being hashable, which is more likely to be the reason to prefer the ordered versions in some cases.

Speaking of types not being hashable the STL in C++11 provides hashing for integral, floating point and pointer types — anything else needs a little help from the application. To make a type hashable, a specialisation of the std::hash templated functor must be provided for it, which will typically just decompose the structure into constituent types which are themselves already hashable. Note that to support the containers, operator==() must also be defined, since they use chaining to resolve collisions.

One gotcha is that all pointers are treated the same, which is to hash on the raw pointer value regardless of the pointer type. Without a little care this could lead to some unfortunate bugs. For example, you might think the example below would store an instance of the (somewhat awful) Person class in a std::unordered_set and then retrieve it. If you run it, however, you’ll see that it finds no match on line 60. Can you spot the issue?

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
#include <cstring>
#include <iostream>
#include <functional>
#include <unordered_set>

class Person
{
  public:
    Person(const char *name, unsigned int age);
    ~Person();

    const char* getName() const { return m_name; }
    int getAge() const { return m_age; }

  private:
    char* m_name;
    unsigned int m_age;
};

Person::Person(const char* name, unsigned int age) : m_age(age)
{
    size_t nameLen = strlen(name);
    m_name = new char[nameLen + 1];
    memcpy(m_name, name, nameLen);
    m_name[nameLen] = '\0';
}

Person::~Person()
{
    delete[] m_name;
}

namespace std
{
template<>
struct hash<Person>
{
    size_t operator()(const Person& p) const
    {
        size_t const h_name = std::hash<const char*>()(p.getName());
        size_t const h_age = std::hash<int>()(p.getAge());
        return h_name ^ (h_age << 1);
    }
};    
} // namespace std

bool operator==(const Person& lhs, const Person& rhs)
{
    return (lhs.getAge() == rhs.getAge() &&
            !strcmp(lhs.getName(), rhs.getName()));
}

int main()
{
    Person andy1("Andy", 37);
    Person andy2("Andy", 37);

    std::unordered_set<Person> people;
    people.insert(andy1);
    if (people.count(andy2) > 0) {
        std::cout << "Match!" << std::endl;
    }

    return 0;
}

Did you spot the issue? The hash value is being calculated on the pointer value returned by Person::getName() and not the contents of the array. A more correct way to calculate h_name might be:

size_t h_name = 0;
for (const char* c = p.getName(); *c; ++c) {
    h_name = std::hash<char>()(*c) ^ (h_name << 1);
}

Regular Expressions

When parsing text, one potentially powerful tool that many people will be familiar with from scripting languages are regular expressions2. Some people are quite opposed to them, apparently on principle. I must admit to a healthy scepticism in some cases, but I’m with Jeff Atwood in that if well used then they have their place.

Well now C++11 has introduced a standard regex interface. The first step is to build a std::regex object using a string specification as usual. This can be used with std::match to check if another string is full match, or std::search to check for a substring match. The result of a successful match is a std::match_results instance which allows access to the string that matches the pattern as well as any other bracketed capture groups within it.

As with std::string the regex related classes are templated on the underlying type used. Where std::string is templated on character types, however, regex classes are templated on string types. For convenience, however, there are some predefined types - for example std::match_results has four specialisations such as std::smatch for use with std::string.

As well as the two functions to check for a single match, there’s also std::regex_iterator (and its predefined specialisations) for iterating across multiple matches. With these facilities it’s really easy to write code that’s similar to something you might see in Perl or Python:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
#include <regex>
#include <string>
#include <iostream>

int main()
{
    std::regex needle("(([0-9]+\\.){3}[0-9]+)");
    std::string haystack("Router IP is 172.16.0.1 or 192.168.0.1");
    for (std::sregex_iterator it(haystack.begin(), haystack.end(), needle);
            it != std::sregex_iterator(); ++it) {
        std::cout << "Found match:" << std::endl;
        for (size_t i = 0; i < it->size(); ++i) {
            std::cout << "  Group " << i << ": " << (*it)[i] << std::endl;
        }
    }
    return 0;
}

The code above should generate the following output:

Found match:
  Group 0: 127.0.0.1
  Group 1: 127.0.0.1
  Group 2: 0.
Found match:
  Group 0: 192.168.0.1
  Group 1: 192.168.0.1
  Group 2: 0.

Something that’s probably not immediately obvious from my pattern above is that capture group zero is special — it always contains the entire matched substring. The remaining capture groups correspond to the bracketed expressions in the pattern in the order of occurrence of the opening brackets. Since I bracketed the entire pattern then capture group 1 also contains the entire matched string. Capture group 2 matches the inner bracketed expression, but the complexity here is that I’ve used the {3} specifier to indicate that group must be repeated three times. As you can see from the results, each match overwrites the previous one so the result of querying the capture group is whatever matched the final time.

To round off it’s important to know what dialect of regular expression is used, as there are significant differences in functionality — actually C++11 supports a surprising six dialects, which are selected when building the std::regex:

The last two are rather minor variations of basic and extended POSIX respectively, simply adding a newline (\n) as an alternative to the pipe character (|) for alternation.

One final note on regexes for gcc users — I had to install gcc 4.9 to get the regex example above to compile, they aren’t supported in any earlier versions.


  1. Typically a red-black-tree, to be specific. 

  2. I’m intentionally ignoring the fact that the “regular expressions” provided by most programming languages are in fact sufficiently expressive that they’re no longer regular languages in the formal definition. I’m as much of a fan of pedantry as the next software engineer, but one of the things Larry Wall and I appear to agree on is that this battle is lost

The next article in the “C++11 Features” series is C++11: Library Changes: Pointers, Randoms, Wrappers and more
Fri 24 Apr, 2015
13 Apr 2015 at 7:26AM in Software
 |  |