Rust is fairly new multi-paradigm system programming language that claims to offer both high performance and strong safety guarantees, particularly around concurrency and memory allocation. As I play with the language a little, I’m using this series of blog posts to discuss some of its more unique features as I come across them. This one discusses Rust’s data types and powerful match operator.
This is the 2nd of the 7 articles that currently make up the “Uncovering Rust” series.
There are a few features you expect from any mainstream imperative programming language. One of them is some support for basic builtin types, such as integers and floats. Another is some sort of structured data type, where you can assign values to named fields. Yet another is some sort of vector, array or list for sequences of values.
We’re going to start this post by looking at how these standard features manifest in Rust. Some of this will be quite familiar to programmers from C++ and similar languages, but there are a few surprises along the way and my main aim is to discuss those.
Rust has builtin scalar types for integers, floats, booleans and characters.
Due to Rust’s low-level nature, you generally have to be explicit about the sizes of
these. There are integral types for 8-, 16-, 32-, 64- and 128-bit values, both
signed and unsigned. For example i32
is a signed 32-bit integer, u128
is an
unsigned 128-bit integer. There are also architecture-dependent types isize
and usize
which use the native word size of the machine. These are typically used
for array offsets. Floats can be f32
for single-precision and f64
for double.
One point that’s worth noting here is that Rust is a strongly typed language and won’t generally perform implicit casts for you, even for numeric types. For example, you can’t assign or compare integers with floats, or even integers of different sizes without doing an explicit conversion. This keeps costs explicit, but it does mean programmers need to consider their types carefully; but that’s no bad thing in my humble opinion.
Specifically on the topic of integers it’s also worth noting that Rust will panic (terminate the execution) if you overflow your integer size, but only in a debug build. If you compile a release build, the overflow is instead allowed to wrap around. However, the clear intention is that programmers shouldn’t be relying on such tricks to write safe and portable code.
Types of bool
can be true
or false
. Even Rust hasn’t managed to introduce
anything surprising or unconventional about booleans! One point of
interest is that the expression in an if
statement has to be a bool
. Once
again there are no implicit conversions, and there is no assumption of equivalence
between, say, false
and 0
as there is in C++.
The final type char
has a slight surprise waiting for us, which is that it
has a size of four bytes and can represent any Unicode code point. It’s great to see
Unicode support front and centre in the language like this, hopefully making
it very difficult for people who want to assume that the world is ASCII. Those
of you familiar with Unicode may also know that the concept of what constitutes
a “character” may surprise those who are used to working only with ASCII, so
there could be puzzled programmers out there at times. But we live in a globalised
world now and there’s no long any excuse for any self-respecting programmer to write
ASCII-first code.
Rust arrays are homogeneous (each array contains values of only one type) and
are of a fixed-size, which must be known at compile time. They are always
stored on the stack. Rust does provide a more dynamic Vec
type which uses
the heap and allows resizing, but I’m not going to discuss that here.
In the interests of safety, Rust requires that every element of an array be initialise when constructed. Because of this, it’s usually not required to specify a type, but of course there is a syntax for doing so. It’s also possible to initialise every item to the same value using a shorthand. These are all illustrated in the example below.
1 2 3 4 |
|
Although the size of the array must be known at compile-time, of course the compiler can’t police your accesses to the array. For example, you may access an item based on user input. Rust does do bounds-checking at runtime, however, Discussion of how to handle runtime errors like this is a topic for another time, but the default action will be to terminate the executable immediately.
The basic mechanics of structs in Rust work quite analogously to those in C++, aside from some minor syntactic differences. Here’s a definition to illustrate:
1 2 3 4 5 6 7 |
|
To create an instance of a struct the syntax is similar except providing values instead of types after the colons. After creation the dot notation to read and assign struct fields will also be familiar to both C++ and Python programmers:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
Note that to assign to first_name
we had to make contact1
mutable and that
this mutability applies to the entire structure, not to each field. No surprises
for C++ programmers there either.
Now there are a couple more unique features that are worth mentioning. The first
of them comes when creating constructor methods. Let’s say we want to avoid
having to set the business
field, so we wrap it up in a function:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
However, it’s a bit tedious repeating all those field names in the body. Well, if the function parameters happen to match the field names you can use a shorthand for this:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Another convenient syntactic trick is the struct update syntax, which can be used to create a copy of another struct with some changes:
1 2 3 4 5 6 7 8 9 |
|
This will duplicate all fields not explicitly changed. There can be a sting
in this particular tail, though, due to the ownership rules. In this example,
the String
value from contact1.email
will be moved into contact2.email
and so the first instance will no longer be valid after this point.
Finally in this section I’ll briefly talk about tuples. I’m talking about them here rather than along with other compound types because I feel they work in a very similar way to structs, just without the field names. They have a fixed size defined when they are created and this cannot change, as with an array. Unlike an array, however, they are heterogeneous: they can contain multiple different types.
One thing that might surprise Python programmers in particular, however, is that the elements of a tuple are accessed using dot notation in the same way as a struct. In a way you can think of it as a struct where the names of the fields are just automatically chosen as base-zero integers.
1 2 3 4 5 6 |
|
If you want to share the definition of a tuple around in the same way as for a struct but you don’t want to give the fields names, you can use a tuple struct to do that:
1 2 3 4 5 6 |
|
In all honesty I’m not entirely sure how useful that’ll be, but time will tell.
The final note here is that structs can also hold references, although none of the examples here utilised that. However, doing so means exercising a little more care because the original value can’t go out of scope any time before any structs with references to it. This is a topic for a future discussion on lifetimes.
Continuing the theme of data types that C++ offers, Rust also has enumerations, hereafter referred to as enums. Beyond the name the similarity gets very loose, however. In C++ enums are essentially a way to add textual aliases to integral values; there’s a bit of syntactic sugar to treat them as regular values, but you don’t have to dip your toes too far under the water to get them bitten by an integer.
In Rust, however, they have features that are more like a union in C++, although unlike a union they don’t rely on the programmer to know which variant is in use at any given time.
You can use them very much like a regular enum. The values defined within the enum are scoped within the namespace of the enumeration name1.
1 2 3 4 5 6 7 8 9 |
|
However, much more powerfully than this these variants can also have data values associated with them, and each variant can be associated with its own data type.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
This construct is great for implementing the sort of code where you need to branch differently based on the underlying type of something. I can just hear the voices of all the object-orientation purists declaring that polymorphism is the correct solution to this problem: that everything should be exposed as an abstract method in the base class that all the derived classes implement. I wouldn’t say I disagree necessarily, but I would also say that this isn’t a clean fit in every case and polymorphism isn’t the one-size-fits-all solution as which it has on occasion been presented.
Rust implements some types of polymorphism and features such as traits are a useful alternative to inheritance for code reuse, as we’ll see in a later post. But since Rust doesn’t implement true inheritance, more properly called subtype polymorphism, then I suspect this flexibility of enumerations is more important in Rust than it would be in C++.
A little further down we’ll see how to use the match operator to do this sort of switching in an elegant way, but first we’ll see one example of a pre-defined enum in Rust that’s particularly widely used.
It’s a very common case that a function needs to return a value in the happy case or raise some sort of error in the less happy case. Different languages have different mechanisms for this, one of the more common in modern languages being to raise exceptions. This is particularly common in Python, where exceptions are used for a large proportion of the functionality, but it’s also quite normal in C++ where the function of the destructors and the stack unwinding process are both heavily oriented around making this a fairly safe process.
Despite its extensive support for exceptions, however, C++ is still a bit of a
hybrid and it has a number of
cases where its APIs still use the other primary method of returning errors,
via the return value. A good example of this is the
std::string::find()
method which searches for a
substring within the parent string. This clearly has two different classes
of result: either the string is found, in which case the offset within the
parent string is returned; or the string is not found, in which case the
method returns the magic std::string::npos
value. In other cases functions
can return either a pointer for the happy case or a NULL
in case of error.
Rust does not support exceptions. This is for a number of reasons, partly related to the overhead of raising exceptions and also the fact that return values make it easier for the compiler to force the programmer to handle all error cases that a function can return.
To implement these error returns in Rust, therefore, is where the Option
enum
comes in useful. It’s defined something like this:
1 2 3 4 |
|
This enum is capable of storing some type T
which is a template type
(generics will be discussed properly in a later post), or the single
value None
. This allows a function to return any value type it wishes,
but also leave open the possibility of returning None
for an error.
That’s about all there is to say about Option
, and we’ll see the
idiomatic way to use it in the next section.
The final thing I’m going to talk about is the match
flow control
operator. This is conceptually similar to the switch
statement
in C++, but it’s got rather more cleverness up its sleeves.
The first thing to note about match
is that unlike switch
in C++
it is an expression instead of a statement. One aspect of Rust I haven’t
talked about yet is that expressions may contain statements, however,
so this isn’t a major obstacle. But it does mean that it’s fairly easy
to use simple match
expressions in assignments or as return values:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
The match
expression has multiple “arms” which have a pattern and a
result expression. To do more than just return a value from the expression,
we can wrap it in braces:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
We can use the patterns to do more than just match specific values, though.
Taking the Option
type from earlier, we can use it to extract the return values
from functions whilst still ensuring we handle all the error cases.
For example, the String::find()
method searches for a substring and returns
an Option<usze>
which is None
if the value wasn’t found or the offset within
the string if it was found. We can use this to, say, extract the domain part
from an email address:
1 2 3 4 5 6 |
|
This function takes a String
reference and returns a string slice representing
the domain part of the email, unless the email address doesn’t contain an @
character in which case we return an empty string. I’m not going to say that
the semantics of an empty string are ideal in this case, but it’s just an example.
As another example we could write a function to display the contact details for
the ContactType
defined earlier:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
One aspect of match
statements that isn’t immediately obvious is that they
are required to be exhaustive. So, if you don’t handle every time enum
value, for example, then you’ll get a compile error. This is what makes
things like the Option
example particularly safe as it forces handling
of all errors, which is generally regarded as a good practice if you’re
writing robust code. This also makes perfect sense if you consider that
match
is an expression: if you assign the result to a variable, say, then
then compiler needs something to assign and if you hit a case that your
match
doesn’t handle then what’s the compiler going to do?
Of course if we’re using match
for something other than an enum then handling
every value would be pretty tedious. For these cases we can use the pattern
_
as the default match. The example below also shows how we can match multiple
patterns using |
as a separator:
1 2 3 4 5 6 |
|
Here we’re meeting the needs of match
by covering every single case. If we
removed that final default arm, the compiler wouldn’t let us get away with it:
error[E0004]: non-exhaustive patterns: `0u32..=5u32`,
`7u32..=27u32`, `29u32..=495u32` and 3 more not covered
--> src/main.rs:10:11
|
10 | match n {
| ^ patterns `0u32..=5u32`, `7u32..=27u32`,
`29u32..=495u32` and 3 more not covered
|
= help: ensure that all possible cases are being handled,
possibly by adding wildcards or more match arms
But what if we really wanted to only handle a single case? It would be pretty
dull if we had to have a default arm in a match
then check for that value
being returned and ignore it.
Let’s take the get_domain()
example from earlier. Let’s say that if you
find a domain, you want to use it; but if not, you have some more complicated
logic to invoke to infer the domain by looking at the username. You could
handle that by doing something like this:
1 2 3 4 5 6 7 8 9 10 11 |
|
But that’s a little clunky. Rust has a special syntax called if let
for handling just a single case like this:
1 2 3 4 5 6 7 |
|
I only recently came across this syntax and my opinions are honestly a
little mixed. Whilst I find the match
statements comprehensible and
intuitive, this odd combination of if
and let
just seems unusual to me.
Mind you, I suspect it’s a common enough case to be useful.
So that’s a whirlwind tour of match
and Rust’s pattern-matching.
It’s important to note that this is a much more powerful feature than I’ve
managed to express here as we’ve only really discussed matching by literals
and by enum type. In general patterns can be used in fairly creative ways
to extract fields from values at the same time as matching literals, and they
can even have conditional expressions added, which Rust calls match guards.
These are illustrated in the (rather contrived!) example below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
|
Hopefully most things there are fairly self-explanatory and in any case it’s just intended as an illustration of the sorts of facilities that are available. It’s also worth mentioning that the compiler does give you some help to detect if you’re masking patterns with earlier ones, but it doesn’t appear to be perfect. For example, if I moved the first two matches to the end of the list, they’re both correctly flagged as unreachable. However, if I move the pattern for white after the pattern for grey it didn’t generate a warning; I’m guessing the job of determining reachability around match guards is just too difficult to do reliably.
Rust’s type system certainly offers some powerful flexibility, and the
pattern matching looks like a fantastic feature for pulling apart
structures and matching special cases within them. The specific
Option
enum also looks like quite a pleasant way to implement the
“value or error” case given that Rust doesn’t offer exceptions for
this purpose.
My main reservation around these features is that there’s an awful
lot of syntax building up here, and it’s a fine line between a good
amount of expressive power and edging into Perl’s “there’s too many
ways to do it” philosophy. The if let
syntax in particular seems
possibly excessive to me. But I’m certainly reserving judgement on
that for now until I’ve had some more experience with the language.
For anyone familiar with C++11, this is what you get when you declare a C++ enum with enum class MyEnum { … }
. ↩