Rust is fairly new multi-paradigm system programming language that claims to offer both high performance and strong safety guarantees, particularly around concurrency and memory allocation. As I play with the language a little, I’m using this series of blog posts to discuss some of its more unique features as I come across them. This one discusses looping constructs and standard library collections.
This is the 3rd of the 7 articles that currently make up the “Uncovering Rust” series.
Having completed a couple of articles where I researched the QUIC and HTTP/3 protocols, I’m now getting to the stage of the Geta project which involves actually starting to write code. I’ve deliberately chosen a language with which I’m still only somewhat familiar, Rust, as an opportunity to become more familiar with it. In an effort to get back into the swing of it, I thought a good approach might be to resurrect this series of articles on Rust that I started a few years ago.
The two previous articles in this series, written in the heady pre-pandemic days of 2019, covered ownership, scalar types and the match
construct. In this article I’ll be extending the discussion of types to common collections within the standard library, which provide more facilities than builtin tuples and arrays; but first I want to look at the loop constructs that Rust provides.
Let’s look at the constructs that control program flow. I’m going to assume you’ve written some sort of code before so you’re familiar with the concept of if
…else
and while
loops, and just focus on the aspects which are specific to Rust.
if
…else
¶I don’t want to dwell on if
…else
, since it’s one of those things that’s very similar in many languages. The only thing I will mention is that it, along with the loop constructs, is counted as an expression and can therefore “return” values. This assignment is a simple example:
let odd_even = if x % 2 == 0 { "even" } else { "odd" };
Note that this is a fully flexible if
…else
construct where each of the blocks can contain an abitrary sequence of instructions, not just a single value as in many languages’ ternary operators. For example, this is also valid:
let odd_even = if x % 2 == 0 {
println!("EVEN");
"even"
} else {
println!("ODD");
"odd"
};
One thing that’s reassuring to know is that the compiler will confirm that the return types of the multiple arms of the conditional evaluate to consistent types — if they did not, the type of odd_even
in the example above would be ambiguous, and that would prevent the compiler doing further type checking on it further down.
while
¶Rust has three main loop constructs, which are while
, loop
and for
. We’ll look at each of these in the next few subsections, starting with while
.
The while
construct in Rust is quite similar to other languages, and anyone who’s written more or less any code in any imperative language can probably figure out what this code is going to do.
let mut i = 0;
while i < 10 {
println!("Value is {}", i);
i += 1;
}
Also, Rust supports break
and continue
in these loops with the usual semantics1. Also similar to some other languages, loops can be labelled to allow breaking out of multiple levels at once.
let mut i = 0;
let mut j = 0;
'outer: while i < 10 {
while j < 10 {
if i * j > 45 {
println!("{} and {}", i, j);
break 'outer;
}
j += 1;
}
i += 1;
j = 0;
}
The final thing to say about while
is that unlike some other constructs, it does not return a value — or rather, the value it returns is always empty. This is unlike the loop
construct that we’ll look at next.
loop
¶The next construct loop
may seem superficially similar to while true {...}
, and it does have the same semantics — it loops until the loop is explicitly broken out of.
However, because the semantics of the loop are intentionally simple, the compiler can be certain that the only exit points from the loop are break
statements, and this allows these statements to return a value in a similar way than we saw with if
…else
earlier. Of course, the language designers could have just made while true
a special case instead of introducing a new keyword, but if that’s the sort of decision that upsets you then you’re probably going to have real trouble finding a language with which you’re satisified2.
Let’s see an example of this in action.
let mut n = 2;
let mut m = 2;
let (found_m, found_n) = loop {
if m * n == 35 {
break (m, n);
}
n += 1;
m = if n > 34 { n = 2; m + 1 } else { m };
};
println!("{} x {} = 35", found_m, found_n);
Now I’m not saying this is an idiomatic, or even sensible, way to solve this particular problem — as with all the code excerpts in this article, it’s just meant to illustrate some syntactic details. But you can see the fact that the loop
block is returning a value — in this case, it’s a 2-tuple of integers. In this case, of course, m
and n
are still in scope and could have been used directly, but that would illustrate returning a value from the loop
.
Of course, you can use loop
just as you would while true
without returning a value, and the use of loop labels also applies equally to loop
.
for
and Iterators¶Now we come to iteration though collections, which is something that tends to differ a little more between languages than while
and if
…else
do. Rust’s for
construct is similar to Python’s in that it iterates over the contents of a collection, as opposed to being some syntactic sugar around a while
loop as it is in C.
let favourite_things = [
"raindrops on roses",
"whiskers on kittens",
"bright copper kettles",
"warm woolen mittens",
"brown paper packages tied up with strings"
];
for thing in favourite_things {
println!("Favourite thing: {}", thing);
}
A collection can make itself iterable in this way by implementing the std::iter::IntoIterator
trait — we haven’t looked at traits yet, but think of them as a defined interface for now. This requires providing an into_iter()
method which returns another object which implements the std::iter::Iterator
trait, and provides a next()
method. This is somewhat similar to the use of iterators in other languages, even if the precise semantics differ somewhat.
Following on from the same definition of favourite_things
above, we could instead rewrite the for
loop to use the iterator directly like this.
let mut my_iter = favourite_things.into_iter();
loop {
match my_iter.next() {
None => break,
Some(thing) => println!("Favourite thing: {}", thing),
}
}
The use of match
here is because next()
returns an Option
which becomes None
once the iterator is exhausted3.
The use of into_iter()
depends on the context in which you call it, but is typically expected to iterate by value. Given Rust’s ownership rules, this can lead to some surprising consequences. Consider the simple application shown below, which is similar to the example above except instead of immuatable string slices, we’ve got an array of heap-allocated String
objects. Also, we try to iterate through it twice.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
You might expect this doesn’t seem to be particularly contraversial, but Rust’s compiler is having none of it.
error[E0382]: use of moved value: `favourite_things`
--> src/main.rs:12:18
|
2 | let favourite_things = [
| ---------------- move occurs because `favourite_things` has type
`[String; 5]`, which does not implement the `Copy` trait
...
9 | for thing in favourite_things {
| ---------------- `favourite_things` moved due to this
implicit call to `.into_iter()`
...
12 | for thing in favourite_things {
| ^^^^^^^^^^^^^^^^ value used here after move
|
note: `into_iter` takes ownership of the receiver `self`,
which moves `favourite_things`
--> /rustc/[...]/library/core/src/iter/traits/collect.rs:262:18
help: consider iterating over a slice of the `[String; 5]`'s content to avoid
moving into the `for` loop
|
9 | for thing in &favourite_things {
| +
(I have manually edited and wrapped some lines for readability)
What’s happening here is that in the first iteration, we’re taking ownership of the String
values in the array, and hence consuming them. The easiest way to solve this is just to iterate over a slice of the array. Since a slice is a type of reference, this can refer to the original items without moving them. We could just iterate over a slice of the whole array by prefixing the name with &
and this borrows immutable references to the values.
9 10 |
|
We can also iterate over what I’m going to call proper slices (i.e. a proper subset of the elements in a collection), and we can make them mutable. To do so we’d also have to make favourite_things
mutable, so we can borrow a mutable reference to its members. The updated example below illustrates these features.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
As well as the into_iter()
used by for
loops, there are two other methods which are commonly (but not necessarily always) provided by collections to derive your own iterators:
iter()
returns an iterator yielding immutable references to items.iter_mut()
returns an iterator yielding mutable references to items.These can be used where you want to be quite certain of the type of value iterated, rather than relying on the context-sensitive behaviour of into_iter()
, but whether a given collection offers them is entirely up to its author.
In the second half of this article I’m going to run through some of the common collections in the Rust standard library, and see how they compare with similar containers in other languages. The ones we’re going to look at are:
Vec
String
HashMap
Vec
¶Vectors in Rust, as in most languages, are just dynamically-sized heap-allocated arrays. Similar to arrays, they are homogenous, so each one can only store a single type, although that type can be, say, an enumeration of multiple other types. As such, they’re implemented as generics, the principles of which will be familiar to those used to many strong-typed languages. Rust uses the same angle-bracket syntax as C++, C# and Java, as opposed to the square brackets seemingly preferred by Go, Scala and Python4.
Here’s a simple example of creating and manipulating a vector.
let mut v:Vec<i32> = Vec::new();
for i in 0..100 {
v.push(i * i);
}
let total: i32 = v.iter().sum();
println!("Total is {}", total);
You’ll note that due to the type specification of v
, the compiler can deduce the type of the other use of Vec
. It’s possible to specify that as well, but you need to use the slightly unusual Vec::<i32>::new()
syntax, where ::<>
is affectionately known as the turbofish operator. I’m sure we’ll look at that in more detail when I get around to writing an article on Rust generics.
You can reference a Vec
using the usual v[i]
syntax, which will panic the item doesn’t exist, or you can use the get(i)
method, which returns Option<T>
to check for existence more gracefully.
As in C++, vectors are always contiguous in memory and hence updating them potentially breaks code that holds references, as the values may need to be moved to a reallocated block. In C++ this is handled by invalidating iterators, but Rust handles this using its usual ownership and reference consistency rules — if you take an immutable reference to an element in a vector, and then you try to use a mutable reference to (say) call push()
whilst the immutable reference is still live, the compiler will flag the error.
Similarly the borrow checker will also make sure you’re not holding any references to the contents of the vector when the vector itself goes out of scope, because that will drop every item within it.
String
¶Some people may debate whether a String
is indeed a collection, but I would argue those people are fooling themselves. String types in any language have a decent amount of complexity around them, and I suspect those who disagree with that are simply unaware of the complexities involved, and that lack of understanding will increase their risk of creating bugs, or simply building systems with a lack of internationalisation, which should be regarded as unacceptable in today’s world.
Besides, if you look at Rust’s source you’ll see that fundamentally a String
is just a Vec<u8>
— and Vec
must be a collection, since we just talked about it.
The core language itself only provides str
which is generally referred to as a string slice. This is because it refers to a subset of a string that’s held somewhere else — in the case of string literals, that somewhere is the read-only data section of the process, and for variables it’s wherever on the heap the String
is storing its content. The str
itself is just a reference to the first character and a length. As with Vec
earlier, that means if you have a mutable String
and you’re holding a reference to a str
covering some subset of it, you’ll find you can’t actually change the string for the same reasons as outlined in the previous section for Vec
.
In other words, this code will fail to compile.
1 2 3 4 5 6 7 |
|
However, if you swapped lines 4 and 5 then everything would be fine, because sub
is not used after bring printed, so the mutable reference borrowed for the push()
doesn’t conflict with anything.
As well as the push_str()
method we saw above, the +
operator can only be used to concatenate strings, though it’s behaviour with regards to ownership is a little fiddly. The first argument has its value moved out, and so is no longer valid aferwards. The second argument, which must be passed by reference, is unchanged. The +
operator strictly takes &str
as its second parameter, but if you pass &String
then the compiler can coerce it.
If you’re going to be doing this a lot, however, the format!()
macro is probably more convenient — this uses the same syntax as println!()
but returns a String
instead of printing the text.
One thing that’ll be familiar to Python programmers is that both str
and String
are both unicode-aware. As it happens, this is achieved using UTF-8
encoding. As a result of this, Rust doesn’t allow you to index in to get a single character of the string — this is because the nth character doesn’t necessarily correspond to the nth byte, so indexing may be slower than the programmer expects for long strings. You can take a single-character slice if you like, of course, but keep reading for the caveats.
Another aspect that Rust exposes to programmers quite plainly are the different interpretations of Unicode data. There are three different ways of looking at a UTF-8 string:
u8
values.char
, which are Unicode scalars.When you take a slice then you’re indexing by byte — if this slices through the middle of a multibyte UTF-8 character you will get a panic at runtime. I think this definitely has the potential to catch some people out. For an example of this, try running the following variant of “hello world” in Hindi5.
This code will panic | |
---|---|
1 2 3 4 5 |
|
If you call the len()
method on s
it’ll tell you the string is 31 bytes long. If you call chars()
it’ll return an iterator over the char
values which make up the string — in this case there are 11 of those. Since some of them are diacritics, however, there are only 7 of those — to check this I had to go find a community library called unicode-segmentation
because the Rust standard library, quite sensibly, doesn’t get drawn this far into the depths of Unicode processing.
Anyway, that’s about it for String
— conceptually simple, but in practice there’s quite a lot to think about. I think it’s important to stress that Rust hasn’t really created this complexity, it’s simply exposed the complexity that a properly internationalised application would need to deal with anyway, potentially with a lot less language support in some cases. But for people used to those languages, it might be a bit of an unexpected learning curve.
HashMap
¶Now we come to our old friend the associative array, aka dictionary, aka hash table, aka HashMap
. This is pretty standard stuff in more or less any language, and anyone who’s been around the block will know that no matter how many fancy data structures you learn about, a simple hash table does a good enough job in a slightly dishearteningly huge number of cases. So knowing how to use it is important.
As with Vec
, HashMap
uses generics — HashMap<String, i32>
uses String
as a key and i32
as a value. Here’s a simple code example to show inserting and retrieving values from the map.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
I’d hope most of that was fairly self-explanatory, but there are a couple of things to note. Firstly, the insert()
method will silently overwrite any existing entry, so it can be used both to insert and update. Secondly, there are a few things worth observing going on in line 9. The get()
method returns an Option<&T>
, so either an immutable reference to the entry or None
. Since we don’t want a reference but the actual value, I use copied()
, which converts references to copies of the underlying value6. Then I use unwrap_or()
to return either the unwrapped value from the Option
, or 0
if the value is None
.
One thing you might also notice is that there’s nothing here declaring the type of counts
— this indicates the flexibility of the type deduction employed by the Rust compiler, which will have given it type HashMap<&str, i32>
. Of course, you’re entirely free to declare the type yourself to force the issue — for example, you might want to make sure that i64
is used for the counters.
Another thing that’s not obvious from the example above is the issue of ownership. Values like i32
implement the Copy
trait, and so values can be copied into the map. But for heap-allocated values like String
then using the actual values will cause them to be moved into the HashMap
, with the original variables becoming invalid. If you store them by reference, then of course references to the values will be stored, but in that case the original lifetimes of the values could cause you problems if you don’t know what you’re doing. We’ll look at the lifetimes of values in a future article.
If we want to read and update an existing entry, as opposed to simply replacing it with insert()
, then the entry()
functioTo n can be used instead of get()
. This returns an enum called Entry
which is a little like Option
, but tailored for the HashMap
. This provides an or_insert()
method, which returns a mutable reference to the entry’s value if it’s already in the HashMap
, or inserts an entry with a value provided as a parameter and returns a mutable reference to that. For the counts
structure above, a new article about a language that might or might not be new could be added like this.
fn count_article(counts: HashMap<&str, i32>, langauge: &str)
{
let count = counts.entry(language).or_insert(0);
*count += 1;
}
I think that about wraps it up for HashMap
. You might be interested to know that Rust uses SipHash as its hashing algorithm, which helps guard against some attacks I discussed in an old article. It’s not the fastest out there, buf if you decide that raw performance is more important to you than security then it’s possible to construct a HashMap
that uses other hashers.
It’s worth noting that the standard library provides a number of other collections that I haven’t gone through here. Just to give you a flavour, here are some examples.
LinkedList
VecDeque
make_contigous()
method which will essentially turn it into a vector and return that as a slice.BTreeMap
HashMap
, most operations on BTrees are $O(\log{n})$.HashSet
and BTreeSet
()
.BinaryHeap
That’s about it for this article — a fairly lightweight one this time, but even at this basic level I found a few surprises. Once again I underestimated the degree to which more or less any construct is an expression which can be evaluated for a value, and I hadn’t appreciated the full subtleties around the loop
vs the while
construct. Also, the reminders about the importance of ownership in Rust were useful.
In terms of the collections, I was pleased to see those offered were a sensible subset of data structures that cover all the common use-cases that I can think of. The String
type had some surprising subtleties, but I’m really glad to see that Rust puts Unicode support and internationalisation front and centre of the developer’s mind — this sort of thing is really difficult to add later if you’ve built a whole application around the assumption of ASCII. Mind you, the runtime panics of splitting a UTF-8 string midway through a code point were a bit of a surprise — that’s definitely something to be very careful about when taking substrings.
Hopefully that’s been enlightening, and it was certainly useful to me as an easy re-introduction to Rust after not using it for a long time. In the upcoming articles I’m going to drill in to the important concept of lifetimes, look at the trait system, take a look at error handling, and a close look at concurrency — these will all be important in the Geta implementation. As usual, if you spot any errors, or have any other feedback, do feel free to get in touch in the comments or by email.
In some langauges these terms differ, but the semantics are typically very similar. For example, Ruby and Perl have next
instead of continue
, and Perl also has last
instead of break
. ↩
In particular, if strictly unnecessary constructs upset you, I really recommend you stay away from Perl. ↩
You might recall we looked at Option
in the previous article in this series — but given that was published nearly four years ago, I think I’d have to forgive you if you didn’t. ↩
I have very little opinion on this, it seems like a choice that’s only really of interest to those stuck with writing the languager’s lexers and parsers. It appears that some people object to the use of angle brackets for this purpose, but I selfishly hope this view doesn’t gain too much traction or language maintainers might be tempted to change their choice. Whilst I may have no strong opinion of the original choice, I would hope most people would agree that changing it later, and braking huge amounts of code in the process, is almost invariably a terrible idea. ↩
Full disclosure: I’ve relied on Google Translate for this, I don’t speak, read or write Hindi, and I wouldn’t know Devanagari script if it knocked on my door and said, “hello, I’m Devanagari script.” So please accept my apologies if this translation is incorrect, and I do hope some naughty Google Engineer hasn’t made me publish something rude on my blog… ↩
There are some details and caveats about copied()
and the closely related cloned()
, but they’re not relevant to this example. ↩