☑ Uncovering Rust: Types and Matching

22 Jun 2019 at 8:00AM in Software
 |  | 

Rust is fairly new multi-paradigm system programming language that claims to offer both high performance and strong safety guarantees, particularly around concurrency and memory allocation. As I play with the language a little, I’m using this series of blog posts to discuss some of its more unique features as I come across them. This one discusses Rust’s data types and powerful match operator.

This is the 2nd of the 7 articles that currently make up the “Uncovering Rust” series.

rusty chain

There are a few features you expect from any mainstream imperative programming language. One of them is some support for basic builtin types, such as integers and floats. Another is some sort of structured data type, where you can assign values to named fields. Yet another is some sort of vector, array or list for sequences of values.

We’re going to start this post by looking at how these standard features manifest in Rust. Some of this will be quite familiar to programmers from C++ and similar languages, but there are a few surprises along the way and my main aim is to discuss those.

Scalar Types

Rust has builtin scalar types for integers, floats, booleans and characters.

Due to Rust’s low-level nature, you generally have to be explicit about the sizes of these. There are integral types for 8-, 16-, 32-, 64- and 128-bit values, both signed and unsigned. For example i32 is a signed 32-bit integer, u128 is an unsigned 128-bit integer. There are also architecture-dependent types isize and usize which use the native word size of the machine. These are typically used for array offsets. Floats can be f32 for single-precision and f64 for double.

One point that’s worth noting here is that Rust is a strongly typed language and won’t generally perform implicit casts for you, even for numeric types. For example, you can’t assign or compare integers with floats, or even integers of different sizes without doing an explicit conversion. This keeps costs explicit, but it does mean programmers need to consider their types carefully; but that’s no bad thing in my humble opinion.

Specifically on the topic of integers it’s also worth noting that Rust will panic (terminate the execution) if you overflow your integer size, but only in a debug build. If you compile a release build, the overflow is instead allowed to wrap around. However, the clear intention is that programmers shouldn’t be relying on such tricks to write safe and portable code.

Types of bool can be true or false. Even Rust hasn’t managed to introduce anything surprising or unconventional about booleans! One point of interest is that the expression in an if statement has to be a bool. Once again there are no implicit conversions, and there is no assumption of equivalence between, say, false and 0 as there is in C++.

The final type char has a slight surprise waiting for us, which is that it has a size of four bytes and can represent any Unicode code point. It’s great to see Unicode support front and centre in the language like this, hopefully making it very difficult for people who want to assume that the world is ASCII. Those of you familiar with Unicode may also know that the concept of what constitutes a “character” may surprise those who are used to working only with ASCII, so there could be puzzled programmers out there at times. But we live in a globalised world now and there’s no long any excuse for any self-respecting programmer to write ASCII-first code.

Arrays

Rust arrays are homogeneous (each array contains values of only one type) and are of a fixed-size, which must be known at compile time. They are always stored on the stack. Rust does provide a more dynamic Vec type which uses the heap and allows resizing, but I’m not going to discuss that here.

In the interests of safety, Rust requires that every element of an array be initialise when constructed. Because of this, it’s usually not required to specify a type, but of course there is a syntax for doing so. It’s also possible to initialise every item to the same value using a shorthand. These are all illustrated in the example below.

1
2
3
4
// These two are equivalent, due to type inference.
let numbers1 = [9, 9, 9, 9, 9];
let numbers2: [i32; 5] = [9, 9, 9, 9, 9];
let numbers3 = [9; 5];  // Repeated value shorthand.

Although the size of the array must be known at compile-time, of course the compiler can’t police your accesses to the array. For example, you may access an item based on user input. Rust does do bounds-checking at runtime, however, Discussion of how to handle runtime errors like this is a topic for another time, but the default action will be to terminate the executable immediately.

Structures and Tuples

The basic mechanics of structs in Rust work quite analogously to those in C++, aside from some minor syntactic differences. Here’s a definition to illustrate:

1
2
3
4
5
6
7
struct Contact {
    first_name: String,
    last_name: String,
    email: String,
    age: u8,
    business: bool,
}

To create an instance of a struct the syntax is similar except providing values instead of types after the colons. After creation the dot notation to read and assign struct fields will also be familiar to both C++ and Python programmers:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
fn main() {
    let mut contact1 = Contact {
        first_name: String::from("John"),
        last_name: String::from("Doe"),
        email: String::from("jdoe@example.com"),
        age: 21,
        business: false,
    };
    println!("Contact name is {} {}",
             contact1.first_name, contact1.last_name);
    contact1.first_name = String::from("Jane");
    println!("Contact name is {} {}",
             contact1.first_name, contact1.last_name);
}

Note that to assign to first_name we had to make contact1 mutable and that this mutability applies to the entire structure, not to each field. No surprises for C++ programmers there either.

Now there are a couple more unique features that are worth mentioning. The first of them comes when creating constructor methods. Let’s say we want to avoid having to set the business field, so we wrap it up in a function:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
fn new_business_contact(first_name: String,
                        last_name: String,
                        email: String,
                        age: u8)
                        -> Contact {
    Contact {
        first_name: first_name,
        last_name: last_name,
        email: email,
        age: age,
        business: true
    }
}

However, it’s a bit tedious repeating all those field names in the body. Well, if the function parameters happen to match the field names you can use a shorthand for this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
fn new_business_contact(first_name: String,
                        last_name: String,
                        email: String,
                        age: u8)
                        -> Contact {
    Contact {
        first_name,
        last_name,
        email,
        age,
        business: true
    }
}

Another convenient syntactic trick is the struct update syntax, which can be used to create a copy of another struct with some changes:

1
2
3
4
5
6
7
8
9
let contact1 = Contact {
    
};

let contact2 = Contact {
    first_name: String::from("John"),
    last_name: String::from("Smith"),
    ..contact1
};

This will duplicate all fields not explicitly changed. There can be a sting in this particular tail, though, due to the ownership rules. In this example, the String value from contact1.email will be moved into contact2.email and so the first instance will no longer be valid after this point.

Finally in this section I’ll briefly talk about tuples. I’m talking about them here rather than along with other compound types because I feel they work in a very similar way to structs, just without the field names. They have a fixed size defined when they are created and this cannot change, as with an array. Unlike an array, however, they are heterogeneous: they can contain multiple different types.

One thing that might surprise Python programmers in particular, however, is that the elements of a tuple are accessed using dot notation in the same way as a struct. In a way you can think of it as a struct where the names of the fields are just automatically chosen as base-zero integers.

1
2
3
4
5
6
fn main() {
    let tup = (123, 4.56, "hello");
    println!("{} {} {}", tup.0, tup.1, tup.2);
    // Can also include explicit types for the tuple fields.
    let tup_copy: (u32, f64, String) = tup;
}

If you want to share the definition of a tuple around in the same way as for a struct but you don’t want to give the fields names, you can use a tuple struct to do that:

1
2
3
4
5
6
struct Colour(u8, u8, u8);

fn main() {
    let purple = Colour(255, 0, 255);
    println!("R={} G={}, B={}", purple.0, purple.1, purple.2);
}

In all honesty I’m not entirely sure how useful that’ll be, but time will tell.

The final note here is that structs can also hold references, although none of the examples here utilised that. However, doing so means exercising a little more care because the original value can’t go out of scope any time before any structs with references to it. This is a topic for a future discussion on lifetimes.

Enumerations

Continuing the theme of data types that C++ offers, Rust also has enumerations, hereafter referred to as enums. Beyond the name the similarity gets very loose, however. In C++ enums are essentially a way to add textual aliases to integral values; there’s a bit of syntactic sugar to treat them as regular values, but you don’t have to dip your toes too far under the water to get them bitten by an integer.

In Rust, however, they have features that are more like a union in C++, although unlike a union they don’t rely on the programmer to know which variant is in use at any given time.

You can use them very much like a regular enum. The values defined within the enum are scoped within the namespace of the enumeration name1.

1
2
3
4
5
6
7
8
9
enum ContactType {
    Personal,
    Colleague,
    Vendor,
    Customer,
}

let contact1_type = ContactType::Personal;
let contact2_type = ContactType::Vendor;

However, much more powerfully than this these variants can also have data values associated with them, and each variant can be associated with its own data type.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
// We reference contacts by their email address except for
// colleagues, where we use employee number; and vendors,
// where we use supplier ID, which consists of three numbers.
enum ContactType {
    Personal(String),
    Colleague(u64),
    Vendor(u32, u32, u32),
    Customer(String)
}

let customer = ContactType::Customer("andy@example.com");
let colleague = ContactType::Colleague(229382);
let supplier = ContactType::Vendor(23, 223, 4);

This construct is great for implementing the sort of code where you need to branch differently based on the underlying type of something. I can just hear the voices of all the object-orientation purists declaring that polymorphism is the correct solution to this problem: that everything should be exposed as an abstract method in the base class that all the derived classes implement. I wouldn’t say I disagree necessarily, but I would also say that this isn’t a clean fit in every case and polymorphism isn’t the one-size-fits-all solution as which it has on occasion been presented.

Rust implements some types of polymorphism and features such as traits are a useful alternative to inheritance for code reuse, as we’ll see in a later post. But since Rust doesn’t implement true inheritance, more properly called subtype polymorphism, then I suspect this flexibility of enumerations is more important in Rust than it would be in C++.

A little further down we’ll see how to use the match operator to do this sort of switching in an elegant way, but first we’ll see one example of a pre-defined enum in Rust that’s particularly widely used.

Option

It’s a very common case that a function needs to return a value in the happy case or raise some sort of error in the less happy case. Different languages have different mechanisms for this, one of the more common in modern languages being to raise exceptions. This is particularly common in Python, where exceptions are used for a large proportion of the functionality, but it’s also quite normal in C++ where the function of the destructors and the stack unwinding process are both heavily oriented around making this a fairly safe process.

Despite its extensive support for exceptions, however, C++ is still a bit of a hybrid and it has a number of cases where its APIs still use the other primary method of returning errors, via the return value. A good example of this is the std::string::find() method which searches for a substring within the parent string. This clearly has two different classes of result: either the string is found, in which case the offset within the parent string is returned; or the string is not found, in which case the method returns the magic std::string::npos value. In other cases functions can return either a pointer for the happy case or a NULL in case of error.

Rust does not support exceptions. This is for a number of reasons, partly related to the overhead of raising exceptions and also the fact that return values make it easier for the compiler to force the programmer to handle all error cases that a function can return.

To implement these error returns in Rust, therefore, is where the Option enum comes in useful. It’s defined something like this:

1
2
3
4
enum Option<T> {
    Some(T).
    None,
}

This enum is capable of storing some type T which is a template type (generics will be discussed properly in a later post), or the single value None. This allows a function to return any value type it wishes, but also leave open the possibility of returning None for an error.

That’s about all there is to say about Option, and we’ll see the idiomatic way to use it in the next section.

Matching

The final thing I’m going to talk about is the match flow control operator. This is conceptually similar to the switch statement in C++, but it’s got rather more cleverness up its sleeves.

The first thing to note about match is that unlike switch in C++ it is an expression instead of a statement. One aspect of Rust I haven’t talked about yet is that expressions may contain statements, however, so this isn’t a major obstacle. But it does mean that it’s fairly easy to use simple match expressions in assignments or as return values:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
enum Direction {
    North,
    South,
    East,
    West,
}

fn get_bearing(d: Direction) -> u16 {
    match d {
        Direction::North => 0,
        Direction::East => 90,
        Direction::South => 180,
        Direction::West => 270,
    }
}

The match expression has multiple “arms” which have a pattern and a result expression. To do more than just return a value from the expression, we can wrap it in braces:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
fn get_bearing(d: Direction) -> u16 {
    match d {
        Direction::North => 0,
        Direction::East => {
            println!("East is East");
            90
        },
        Direction::South => {
            println!("Due South");
            180
        },
        Direction::West => {
            println!("Go West");
            270
        },
    }
}

We can use the patterns to do more than just match specific values, though. Taking the Option type from earlier, we can use it to extract the return values from functions whilst still ensuring we handle all the error cases.

For example, the String::find() method searches for a substring and returns an Option<usze> which is None if the value wasn’t found or the offset within the string if it was found. We can use this to, say, extract the domain part from an email address:

1
2
3
4
5
6
fn get_domain(email: &String) -> &str {
    match email.find('@') {
        None => "",
        Some(x) => &email[x+1..],
    }
}

This function takes a String reference and returns a string slice representing the domain part of the email, unless the email address doesn’t contain an @ character in which case we return an empty string. I’m not going to say that the semantics of an empty string are ideal in this case, but it’s just an example.

As another example we could write a function to display the contact details for the ContactType defined earlier:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
enum ContactType {
    Personal(String),
    Colleague(u64),
    Vendor(u32, u32, u32),
    Customer(String)
}

fn show_contact(contact: ContactType) {
    match contact {
        ContactType::Personal(email) => {
            println!("Personal: {}", email);
        },
        ContactType::Colleague(employee_number) => {
            println!("Colleague: {}", employee_number);
        },
        ContactType::Vendor(id1, id2, id3) => {
            println!("Vendor: {}-{}-{}", id1, id2, id3);
        },
        ContactType::Customer(email) => {
            println!("Customer: {}", email);
        },
    }
}

One aspect of match statements that isn’t immediately obvious is that they are required to be exhaustive. So, if you don’t handle every time enum value, for example, then you’ll get a compile error. This is what makes things like the Option example particularly safe as it forces handling of all errors, which is generally regarded as a good practice if you’re writing robust code. This also makes perfect sense if you consider that match is an expression: if you assign the result to a variable, say, then then compiler needs something to assign and if you hit a case that your match doesn’t handle then what’s the compiler going to do?

Of course if we’re using match for something other than an enum then handling every value would be pretty tedious. For these cases we can use the pattern _ as the default match. The example below also shows how we can match multiple patterns using | as a separator:

1
2
3
4
5
6
fn is_perfect(n: u32) -> bool {
    match n {
        6 | 28 | 496 | 8128 | 33_550_336 => true,
        _ => false
    }
}

Here we’re meeting the needs of match by covering every single case. If we removed that final default arm, the compiler wouldn’t let us get away with it:

error[E0004]: non-exhaustive patterns: `0u32..=5u32`,
`7u32..=27u32`, `29u32..=495u32` and 3 more not covered
  --> src/main.rs:10:11
   |
10 |     match n {
   |           ^ patterns `0u32..=5u32`, `7u32..=27u32`,
`29u32..=495u32` and 3 more not covered
   |
   = help: ensure that all possible cases are being handled,
possibly by adding wildcards or more match arms

But what if we really wanted to only handle a single case? It would be pretty dull if we had to have a default arm in a match then check for that value being returned and ignore it.

Let’s take the get_domain() example from earlier. Let’s say that if you find a domain, you want to use it; but if not, you have some more complicated logic to invoke to infer the domain by looking at the username. You could handle that by doing something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
fn get_domain(email: &String) -> &str {
    let ret = match email.find('@') {
        None => "",
        Some(x) => &email[x+1..],
    };
    if ret != "" {
        ret;
    } else {
        // More complex logic goes here...
    }
}

But that’s a little clunky. Rust has a special syntax called if let for handling just a single case like this:

1
2
3
4
5
6
7
fn get_domain(email: &String) -> &str {
    if let Some(x) = email.find('@') {
        &email[x+1..];
    } else {
        // More complex logic goes here...
    }
}

I only recently came across this syntax and my opinions are honestly a little mixed. Whilst I find the match statements comprehensible and intuitive, this odd combination of if and let just seems unusual to me. Mind you, I suspect it’s a common enough case to be useful.

So that’s a whirlwind tour of match and Rust’s pattern-matching. It’s important to note that this is a much more powerful feature than I’ve managed to express here as we’ve only really discussed matching by literals and by enum type. In general patterns can be used in fairly creative ways to extract fields from values at the same time as matching literals, and they can even have conditional expressions added, which Rust calls match guards. These are illustrated in the (rather contrived!) example below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
struct Colour {
    red: u8,
    green: u8,
    blue: u8
}

fn classify_colour(c: Colour) {
    match c {
        Colour {red: 0, green: 0, blue: 0} => {
            println!("Black");
        },
        Colour {red: 255, green: 255, blue: 255} => {
            println!("White");
        },
        Colour {red: r, green: 0, blue: 0} => {
            println!("Red {}", r);
        },
        Colour {red: 0, green: g, blue: 0} => {
            println!("Green {}", g);
        },
        Colour {red: 0, green: 0, blue: b} => {
            println!("Blue {}", b);
        },
        Colour {red: r, green: g, blue: 0} => {
            println!("Brown {} {}", r, g);
        },
        Colour {red: r, green: 0, blue: b} => {
            println!("Purple {} {}", r, b);
        },
        Colour {red: r, green: g, blue: b} if r == b && r == g => {
            println!("Grey {}", r);
        }
        Colour {red: r, green: g, blue: b} => {
            println!("Mixed colour {}, {}, {}", r, g, b);
        }
    }
}

Hopefully most things there are fairly self-explanatory and in any case it’s just intended as an illustration of the sorts of facilities that are available. It’s also worth mentioning that the compiler does give you some help to detect if you’re masking patterns with earlier ones, but it doesn’t appear to be perfect. For example, if I moved the first two matches to the end of the list, they’re both correctly flagged as unreachable. However, if I move the pattern for white after the pattern for grey it didn’t generate a warning; I’m guessing the job of determining reachability around match guards is just too difficult to do reliably.

Conclusions

Rust’s type system certainly offers some powerful flexibility, and the pattern matching looks like a fantastic feature for pulling apart structures and matching special cases within them. The specific Option enum also looks like quite a pleasant way to implement the “value or error” case given that Rust doesn’t offer exceptions for this purpose.

My main reservation around these features is that there’s an awful lot of syntax building up here, and it’s a fine line between a good amount of expressive power and edging into Perl’s “there’s too many ways to do it” philosophy. The if let syntax in particular seems possibly excessive to me. But I’m certainly reserving judgement on that for now until I’ve had some more experience with the language.


  1. For anyone familiar with C++11, this is what you get when you declare a C++ enum with enum class MyEnum { … }

The next article in the “Uncovering Rust” series is Uncovering Rust: Loops and Collections
Mon 17 Apr, 2023
22 Jun 2019 at 8:00AM in Software
 |  |