I’ve finally started to look into the new features in C++11 and I thought it would be useful to jot down the highlights, for myself or anyone else who’s curious. Since there’s a lot of ground to cover, I’m going to look at each item in its own post — this is the first of the final two that cover what I feel to be the most important changes to the standard template library.
This is the 7th of the 8 articles that currently make up the “C++11 Features” series.
C++11 introduces many changes to the STL as well as the core language itself. This post is the start of a summary of what I feel to be the most important ones, with a second post to follow to cover the remainder.
I don’t want to cover this in too much detail as I feel most of the points are fairly self-evident, but as the core library has had several features added then the STL has, unsurprisingly, been updated to take advantage of them. A handful of these changes for flavour:
char16_t
and char32_t
types for UTF-16 and UTF-32 respectively.explicit
allows some common gotchas to be addressed.C++11 contains a variety of core language improvements which support multithreaded code, such as a new memory model which allows thread-safe construction of static objects and support for thread-local storage. In tandem with these changes the STL also has some platform-independent support for threads as well.
There’s a new std::thread
class which can be passed a function
object and a series of arguments to execute that function in a separate thread
of execution. Generic waits for completion are possible with
std::thread::join()
and there’s also
access to the native underlying thread ID
where possible.
A handful of synchronisation primitives have also been added including
std::mutex
, std::recursive_mutex
,
std::condition_variable
and
std::condition_variable_any
. These have all
the usual straightforward semantics.
As well as these primitives there are some other very handy additions. The
std::lock_guard
class is a simple RAII wrapper for any
lockable and std::unique_lock
is a useful wrapper
which allows deferred locking with RAII semantics and also locking with
a timeout. There’s also std::lock
which allows a set of lockables
to be all acquired whilst avoiding deadlock.
For higher performance, and greater convenience, there’s also the
templated std::atomic
which allows safe, lock-free access to
fundamental data types - there are also a set of convenience specialisations
(e.g. std::atomic_int
, std::atomic_char
). It’s even possible to specify
the precise ordering constraints to use for a particular
operation with these types.
Finally some higher-level support for asynchronous operations has been
added with support for futures. A function can be made
asynchronous with std::async
,
std::packaged_task
or std::promise
,
all of which return a std::future
which allows the result
of the operation to be obtained once available. One annoyance, however,
is that there’s no way to compose futures and
hence wait on any of a whole set of asynchronous operations to complete.
All in all, then, a not quite perfect but nonetheless comprehensive set of platform independent tools for threading C++. Frankly, it’s about time.
Update: A couple of further notes on threading that may be helpful.
Firstly, when using GCC on POSIX platforms (and possibly others) you should
compile with the -pthread
option. This enables any platform-specific
requirements for pthreads — on my Linux box it seems to add (at least)
the following options:
-D_REENTRANT
to the pre-processor definitions, which changes the behaviour
of some glibc code. For example, I think it puts errno
into
thread-local storage.-lpthreads
to the list of linker flags to link in the required
library code.Secondly, remember that the behaviour of an exception propagating to the
outermost scope of a thread is unchanged — in other words, if it’s not
handled then it will invoke std::terminate()
just as
if main()
had thrown it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
That code should run fine, but uncomment line 8 and you’ll see what I mean.
OK, that’s probably not too surprising. However, another more subtle cause
of std::terminate()
is if you destroy a joinable std::thread
object
without joining it first. For example, change lines 13-15 to this and see
whether you get as far as the Joined
message:
std::thread* t1 = new std::thread(hello);
std::this_thread::sleep_for(std::chrono::seconds(2));
delete t1;
The variadic macro support added in C++11 allowed the addition of quite clean
support for tuples — i.e. a heterogenous fixed-dimension list of members,
a little like a struct
whose members don’t have names. The types of members
are declared with template parameters, as you would expect, and then
std::get
can be used to obtain references to the members which
can be used to both query and update them.
std::tuple<int, std::string> myTuple(123, "hello, world");
std::cout << std::get<0>(myTuple) << " :: "
<< std::get<1>(myTuple) << std::endl;
std::get<0>(myTuple) = 456;
std::get<1>(myTuple) = "goodbye, world";
C++11’s newly repurposed auto
keyword and type inference can be used to
declare tuples even more conveniently:
auto myTuple = std::make_tuple(123, "hello, world");
One thing that I was really happy to see finally added was a standardisation
of hash tables. The venerable std::map
has been around for some
time now, and serves as a perfectly serviceable
associative array — it’s always been a little
irritating, however, that it’s implemented as a tree1. This is
great if you need to iterate over the elements in sorted order, but if that’s
not important then the \( \mathcal{O}(\log{}n) \) access time seems a harsh price
to pay when a well-implemented hash table can average constant time access.
There are four additional types which are hash equivalents of the existing associative containers:
I find this particularly useful as associative containers can often grow
quite large but if the ratio of lookups to inserts is large then the
logarithmic behaviour of std::map
is a bit of a turnoff. This is almost
certainly going to be become my default associative container, barring
the requirement for sorted order iteration, which I find is rare; and
the issue of the key type not being hashable, which is more likely to be the
reason to prefer the ordered versions in some cases.
Speaking of types not being hashable the STL in C++11 provides hashing for
integral, floating point and pointer types — anything else needs a little
help from the application. To make a type hashable, a specialisation of the
std::hash
templated functor must be provided for it, which
will typically just decompose the structure into constituent types which
are themselves already hashable. Note that to support the containers,
operator==()
must also be defined, since they use
chaining to resolve collisions.
One gotcha is that all pointers are treated the same, which is to hash on
the raw pointer value regardless of the pointer type. Without a little care
this could lead to some unfortunate bugs. For example, you might think the
example below would store an instance of the (somewhat awful) Person
class in a std::unordered_set
and then retrieve it. If you run it, however,
you’ll see that it finds no match on line 60. Can you spot the issue?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
|
Did you spot the issue? The hash value is being calculated on the pointer
value returned by Person::getName()
and not the contents of the array.
A more correct way to calculate h_name
might be:
size_t h_name = 0;
for (const char* c = p.getName(); *c; ++c) {
h_name = std::hash<char>()(*c) ^ (h_name << 1);
}
When parsing text, one potentially powerful tool that many people will be familiar with from scripting languages are regular expressions2. Some people are quite opposed to them, apparently on principle. I must admit to a healthy scepticism in some cases, but I’m with Jeff Atwood in that if well used then they have their place.
Well now C++11 has introduced a standard regex interface. The first step
is to build a std::regex
object using a string specification
as usual. This can be used with std::match
to check if
another string is full match, or std::search
to check for
a substring match. The result of a successful match is a
std::match_results
instance which allows access to
the string that matches the pattern as well as any other bracketed capture
groups within it.
As with std::string
the regex related classes are templated on the
underlying type used. Where std::string
is templated on character types,
however, regex classes are templated on string types. For convenience,
however, there are some predefined types - for example
std::match_results
has four specialisations
such as std::smatch
for use with std::string
.
As well as the two functions to check for a single match, there’s also
std::regex_iterator
(and its predefined
specialisations) for iterating across
multiple matches. With these facilities it’s really easy to write code
that’s similar to something you might see in Perl or Python:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
The code above should generate the following output:
Found match:
Group 0: 127.0.0.1
Group 1: 127.0.0.1
Group 2: 0.
Found match:
Group 0: 192.168.0.1
Group 1: 192.168.0.1
Group 2: 0.
Something that’s probably not immediately obvious from my pattern above is
that capture group zero is special — it always contains the entire matched
substring. The remaining capture groups correspond to the bracketed expressions
in the pattern in the order of occurrence of the opening brackets. Since I
bracketed the entire pattern then capture group 1 also contains the
entire matched string. Capture group 2 matches the inner bracketed expression,
but the complexity here is that I’ve used the {3}
specifier to indicate that
group must be repeated three times. As you can see from the results, each
match overwrites the previous one so the result of querying the capture
group is whatever matched the final time.
To round off it’s important to know what dialect of regular expression is used,
as there are significant differences in functionality — actually C++11
supports a surprising six dialects, which are selected when building the
std::regex
:
The last two are rather minor variations of basic and extended POSIX
respectively, simply adding a newline (\n
) as an alternative to the
pipe character (|
) for alternation.
One final note on regexes for gcc users — I had to install gcc 4.9 to get the regex example above to compile, they aren’t supported in any earlier versions.
Typically a red-black-tree, to be specific. ↩
I’m intentionally ignoring the fact that the “regular expressions” provided by most programming languages are in fact sufficiently expressive that they’re no longer regular languages in the formal definition. I’m as much of a fan of pedantry as the next software engineer, but one of the things Larry Wall and I appear to agree on is that this battle is lost. ↩