Why are explicit lifetimes needed in Rust?

2019-01-03 01:42发布

I was reading the lifetimes chapter of the Rust book, and I came across this example for a named/explicit lifetime:

struct Foo<'a> {
    x: &'a i32,
}

fn main() {
    let x;                    // -+ x goes into scope
                              //  |
    {                         //  |
        let y = &5;           // ---+ y goes into scope
        let f = Foo { x: y }; // ---+ f goes into scope
        x = &f.x;             //  | | error here
    }                         // ---+ f and y go out of scope
                              //  |
    println!("{}", x);        //  |
}                             // -+ x goes out of scope

It's quite clear to me that the error being prevented by the compiler is the use-after-free of the reference assigned to x: after the inner scope is done, f and therefore &f.x become invalid, and should not have been assigned to x.

My issue is that the problem could have easily been analyzed away without using the explicit 'a lifetime, for instance by inferring an illegal assignment of a reference to a wider scope (x = &f.x;).

In which cases are explicit lifetimes actually needed to prevent use-after-free (or some other class?) errors?

9条回答
你好瞎i
2楼-- · 2019-01-03 01:55

The reason why your example does not work is simply because Rust only has local lifetime and type inference. What you are suggesting demands global inference. Whenever you have a reference whose lifetime cannot be elided, it must be annotated.

查看更多
虎瘦雄心在
3楼-- · 2019-01-03 01:57

The other answers all have salient points (fjh's concrete example where an explicit lifetime is needed), but are missing one key thing: why are explicit lifetimes needed when the compiler will tell you you've got them wrong?

This is actually the same question as "why are explicit types needed when the compiler can infer them". A hypothetical example:

fn foo() -> _ {  
    ""
}

Of course, the compiler can see that I'm returning a &'static str, so why does the programmer have to type it?

The main reason is that while the compiler can see what your code does, it doesn't know what your intent was.

Functions are a natural boundary to firewall the effects of changing code. If we were to allow lifetimes to be completely inspected from the code, then an innocent-looking change might affect the lifetimes, which could then cause errors in a function far away. This isn't a hypothetical example. As I understand it, Haskell has this problem when you rely on type inference for top-level functions. Rust nipped that particular problem in the bud.

There is also an efficiency benefit to the compiler — only function signatures need to be parsed in order to verify types and lifetimes. More importantly, it has an efficiency benefit for the programmer. If we didn't have explicit lifetimes, what does this function do:

fn foo(a: &u8, b: &u8) -> &u8

It's impossible to tell without inspecting the source, which would go against a huge number of coding best practices.

by inferring an illegal assignment of a reference to a wider scope

Scopes are lifetimes, essentially. A bit more clearly, a lifetime 'a is a generic lifetime parameter that can be specialized with a specific scope at compile time, based on the call site.

are explicit lifetimes actually needed to prevent [...] errors?

Not at all. Lifetimes are needed to prevent errors, but explicit lifetimes are needed to protect what little sanity programmers have.

查看更多
叛逆
4楼-- · 2019-01-03 01:58

The lifetime annotation in the following structure:

struct Foo<'a> {
    x: &'a i32,
}

specifies that a Foo instance shouldn't outlive the reference it contains (x field).

The example you came across in the Rust book doesn't illustrate this because f and y variables go out of scope at the same time.

A better example would be this:

fn main() {
    let f : Foo;
    {
        let n = 5;  // variable that is invalid outside this block
        let y = &n;
        f = Foo { x: y };
    };
    println!("{}", f.x);
}

Now, f really outlives the variable pointed to by f.x.

查看更多
smile是对你的礼貌
5楼-- · 2019-01-03 02:06

Let's have a look at the following example.

fn foo<'a, 'b>(x: &'a u32, y: &'b u32) -> &'a u32 {
    x
}

fn main() {
    let x = 12;
    let z: &u32 = {
        let y = 42;
        foo(&x, &y)
    };
}

Here, the explicit lifetimes are important. This compiles because the result of foo has the same lifetime as its first argument ('a), so it may outlive its second argument. This is expressed by the lifetime names in the signature of foo. If you switched the arguments in the call to foo the compiler would complain that y does not live long enough:

error[E0597]: `y` does not live long enough
  --> src/main.rs:10:5
   |
9  |         foo(&y, &x)
   |              - borrow occurs here
10 |     };
   |     ^ `y` dropped here while still borrowed
11 | }
   | - borrowed value needs to live until here
查看更多
迷人小祖宗
6楼-- · 2019-01-03 02:06

The case from the book is very simple by design. The topic of lifetimes is deemed complex.

The compiler cannot easily infer the lifetime in a function with multiple arguments.

Also, my own optional crate has an OptionBool type with an as_slice method whose signature actually is:

fn as_slice(&self) -> &'static [bool] { ... }

There is absolutely no way the compiler could have figured that one out.

查看更多
ゆ 、 Hurt°
7楼-- · 2019-01-03 02:07

As a newcomer to Rust, my understanding is that explicit lifetimes serve two purposes.

  1. Putting an explicit lifetime annotation on a function restricts the type of code that may appear inside that function. Explicit lifetimes allow the compiler to ensure that your program is doing what you intended.

  2. If you (the compiler) want(s) to check if a piece of code is valid, you (the compiler) will not have to iteratively look inside every function called. It suffices to have a look at the annotations of functions that are directly called by that piece of code. This makes your program much easier to reason about for you (the compiler), and makes compile times managable.

On point 1., Consider the following program written in Python:

import pandas as pd
import numpy as np

def second_row(ar):
    return ar[0]

def work(second):
    df = pd.DataFrame(data=second)
    df.loc[0, 0] = 1

def main():
    # .. load data ..
    ar = np.array([[0, 0], [0, 0]])

    # .. do some work on second row ..
    second = second_row(ar)
    work(second)

    # .. much later ..
    print(repr(ar))

if __name__=="__main__":
    main()

which will print

array([[1, 0],
       [0, 0]])

This type of behaviour always surprises me. What is happening is that df is sharing memory with ar, so when some of the content of df changes in work, that change infects ar as well. However, in some cases this may be exactly what you want, for memory efficiency reasons (no copy). The real problem in this code is that the function second_row is returning the first row instead of the second; good luck debugging that.

Consider instead a similar program written in Rust:

#[derive(Debug)]
struct Array<'a, 'b>(&'a mut [i32], &'b mut [i32]);

impl<'a, 'b> Array<'a, 'b> {
    fn second_row(&mut self) -> &mut &'b mut [i32] {
        &mut self.0
    }
}

fn work(second: &mut [i32]) {
    second[0] = 1;
}

fn main() {
    // .. load data ..
    let ar1 = &mut [0, 0][..];
    let ar2 = &mut [0, 0][..];
    let mut ar = Array(ar1, ar2);

    // .. do some work on second row ..
    {
        let second = ar.second_row();
        work(second);
    }

    // .. much later ..
    println!("{:?}", ar);
}

Compiling this, you get

error[E0308]: mismatched types
 --> src/main.rs:6:13
  |
6 |             &mut self.0
  |             ^^^^^^^^^^^ lifetime mismatch
  |
  = note: expected type `&mut &'b mut [i32]`
             found type `&mut &'a mut [i32]`
note: the lifetime 'b as defined on the impl at 4:5...
 --> src/main.rs:4:5
  |
4 |     impl<'a, 'b> Array<'a, 'b> {
  |     ^^^^^^^^^^^^^^^^^^^^^^^^^^
note: ...does not necessarily outlive the lifetime 'a as defined on the impl at 4:5
 --> src/main.rs:4:5
  |
4 |     impl<'a, 'b> Array<'a, 'b> {
  |     ^^^^^^^^^^^^^^^^^^^^^^^^^^

In fact you get two errors, there is also one with the roles of 'a and 'b interchanged. Looking at the annotation of second_row, we find that the output should be &mut &'b mut [i32], i.e., the output is supposed to be a reference to a reference with lifetime 'b (the lifetime of the second row of Array). However, because we are returning the first row (which has lifetime 'a), the compiler complains about lifetime mismatch. At the right place. At the right time. Debugging is a breeze.

查看更多
登录 后发表回答