Rust - Lifetimes

2023-06-01

Previously published

This article was previously published on len-learns-rust.com. A full index of these articles can be found here.

One of the jobs of the Rust compiler’s “borrow checker” is to track the life of each reference to a variable so that it can prevent dangling references. To do this, it annotates each variable and reference with details of the scope in which it is valid. This annotation is called a lifetime.

All variables have a lifetime annotation, but not all lifetime annotations need to be explicit and visible in the source code as the compiler is allowed to accept code where lifetimes have been elided in certain common circumstances.

This is sensible as is means that for most code, where the lifetimes are obvious, they’re not needed explicitly because the compiler can work them out. This is also part of what makes lifetimes a little confusing for Rust beginners, I think. You can write plenty of code without needing to know about lifetimes at all, and then suddenly the compiler pops up with an error message that mentions lifetimes, and it can be confusing; even if you have read about them…

Also, whilst in many situations it’s appropriate to follow the compiler’s advice and work out and annotate the code with explicit lifetimes, it’s often not the solution. You can sometimes end up wasting time trying to specify the required lifetime only to realise that what you actually need is for the variable to be reference counted, using std::rc::Rc<> which removes the problem entirely as the reference count manages the lifetime of the reference-counted object and the reference count itself can be cloned.

Due to the borrow checker and lifetimes, in the example below, the compiler knows that the reference r no longer references a valid variable at the point that you try and use it for the println().

fn main() {
    let r;                // ---------+-- 'a
                          //          |
    {                     //          |
        let x = 5;        // -+-- 'b  |
        r = &x;           //  |       |
    }                     // -+       |
                          //          |
    println!("r: {}", r); //          |
}                         // ---------+

Compiler says "no!"

The error message is useful and the compiler prevents you from attempting to use a dangling reference.

error[E0597]: `x` does not live long enough
 --> src/main.rs:6:13
  |
5 |         let x = 5;        // -+-- 'b  |
  |             - binding `x` declared here
6 |         r = &x;           //  |       |
  |             ^^ borrowed value does not live long enough
7 |     }                     // -+       |
  |     - `x` dropped here while still borrowed
8 |                           //          |
9 |     println!("r: {}", r); //          |
  |                       - borrow later used here

For more information about this error, try `rustc --explain E0597`.

In need of a lifetime

With structs, it can be more complex. Suppose we had the following code that creates a log and a thing that uses it. See here for the supporting code that makes it work.

struct Log {
    log_lines: Vec<String>,
}

struct ThingThatLogs {
    log: &Log,
}

Compiler says "no!"

The compiler would helpfully explain that we need an explicit lifetime.

error[E0106]: missing lifetime specifier
  --> src\main.rs:14:10
   |
14 |     log: &Log,
   |          ^ expected named lifetime parameter
   |
help: consider introducing a named lifetime parameter
   |
13 ~ struct ThingThatLogs<'a> {
14 ~     log: &'a Log,
   |

Adding explicit lifetime annotation as suggested by the compiler is one solution, and the code for that can be found here. However, perhaps there are other ways around this problem? We could, for example, consider cloning the log? Of course, whilst this code removes the error, it gives a completely different result; rather than sharing a single log, each user of the log has a separate log.

Another alternative may to decide that the ThingThatLogs should actually share the Log by using reference counts to keep the log alive whilst the ThingThatLogs needs it. Something like this:

struct ThingThatLogs {
    log: std::rc::Rc<Log>,
}

impl ThingThatLogs {
    fn new(log: &std::rc::Rc<Log>) -> Self {
        ThingThatLogs { log: log.clone() }
    }
}

The code for this can be found here. As you can see, in this code, we’re being very explicit about the fact that the ThingThatLogs has a reference counted Log.

My initial gut feeling is that there are several more steps to take before this code is good. However, in retrospect, I think that this is where we should stop. This is the most flexible approach, and, from the Rust I’ve seen, it appears to be the preferred method.

However, so that my gut doesn’t lead me astray at a later date, I’ll detail the extra steps that I’ve considered and rejected…

Firstly, my gut suggests that we should encapsulate the reference count with the log, rather than proudly wearing our reference count on the outside. This would mean we end up with code like this:

#[derive(Clone)]
struct Log {
    log_lines: Rc::<Vec<String>>,
}

impl Log {
    fn new() -> Self {
        Log {
            log_lines: Rc::new(Vec::new()),
        }
    }
}

struct ThingThatLogs {
    log: Log,
}

impl ThingThatLogs {
    fn new(log: &Log) -> Self {
        ThingThatLogs { log: log.clone() }
    }
}

I think this is the wrong approach but not for the first reason that my gut told me… After all, we can only move the reference count inside the log because the log is so simple, what if it was more complicated… My gut, who’s also a C++ guy, suggests the Pimpl idiom, and so we’d end up with something like this:

struct LogImpl {
    log_lines: Vec<String>,
    counter : u32
    // as much extra stuff as we need...
}

#[derive(Clone)]
struct Log {
    log_impl: Rc<LogImpl>,
}

This allows for the log to be big and complex and still be easily reference-counted, but it’s still wrong. Or at least the current version of me thinks it’s wrong.

The reason that I think it’s wrong is that it gives us a log that now has two things that it does, it deals with the task of logging and also with the lifetime of the log. This makes it harder to use the log, we may wish to share it between threads and to do that we’d have to use an atomic reference count, std::sync::Arc and that would mean either changing the insides of the log, which we might not be able to do, or wrapping the log in another reference count… These reasons point towards us breaking two¹ of the SOLID design principles, but even if you don’t care about that, the reasons speak for themselves. Let’s leave the reference-count on the outside. The code for the Arc<> version is here, and notice that the Log in this, the Rc<> version and the explicit annotation version is unchanged.

Summing up

Part of the Rust compiler tracks the lifetimes of variables to enable it to make sure that the code you write is safe. Sometimes the errors that it produces suggest that you can add explicit lifetime annotations to allow otherwise broken code to compile. Sometimes this is the approach to take, but sometimes other approaches may be better. Whatever you do, the management of the lifetime of a variable is probably best done explicitly, on the outside of the variable, rather than being hidden away.

Join in

The code can be found here on GitHub each step on the journey will have one or more separate directories of code, so this article’s code can be found here:

In need of a lifetime - we want to share a log between two things that log
Explicit lifetimes - fixing the compiler error with a lifetime annotation
Cloneable - making the log cloneable fixes the error but doesn’t do what we want
Reference counted - adding a std::rc::Rc<> as an alternative to lifetime annotation…
Encapsulated reference count - moving the std::rc::Rc<> inside the struct
Encapsulated reference count, with impl - chasing a bad design to its logical conclusion
Atomic reference counts - replacing the std::rc::Rc<> in (4) with a std::sync::Arc<>

this allows for easy comparison of changes at each stage.

Of course, there may be a better way; leave comments if you’d like to help me learn.

The two principles of SOLID that we’re breaking here are the single responsibility principle and the open-closed principle. ↩︎