Why does Rust borrow checker reject this code?

2020-05-27 04:53发布

问题:

I'm getting a Rust compile error from the borrow checker, and I don't understand why. There's probably something about lifetimes I don't fully understand.

I've boiled it down to a short code sample. In main, I want to do this:

fn main() {
    let codeToScan = "40 + 2";
    let mut scanner = Scanner::new(codeToScan);
    let first_token = scanner.consume_till(|c| { ! c.is_digit ()});
    println!("first token is: {}", first_token);
    // scanner.consume_till(|c| { c.is_whitespace ()}); // WHY DOES THIS LINE FAIL?
}

Trying to call scanner.consume_till a second time gives me this error:

example.rs:64:5: 64:12 error: cannot borrow `scanner` as mutable more than once at a time
example.rs:64     scanner.consume_till(|c| { c.is_whitespace ()}); // WHY DOES THIS LINE FAIL?
                  ^~~~~~~
example.rs:62:23: 62:30 note: previous borrow of `scanner` occurs here; the mutable borrow prevents subsequent moves, borrows, or modification of `scanner` until the borrow ends
example.rs:62     let first_token = scanner.consume_till(|c| { ! c.is_digit ()});
                                    ^~~~~~~
example.rs:65:2: 65:2 note: previous borrow ends here
example.rs:59 fn main() {
...
example.rs:65 }

Basically, I've made something like my own iterator, and the equivalent to the "next" method takes &mut self. Because of that, I can't use the method more than once in the same scope.

However, the Rust std library has an iterator which can be used more than once in the same scope, and it also takes a &mut self parameter.

let test = "this is a string";
let mut iterator = test.chars();
iterator.next();
iterator.next(); // This is PERFECTLY LEGAL

So why does the Rust std library code compile, but mine doesn't? (I'm sure the lifetime annotations are at the root of it, but my understanding of lifetimes doesn't lead to me expecting a problem).

Here's my full code (only 60 lines, shortened for this question):

 use std::str::{Chars};
use std::iter::{Enumerate};

#[deriving(Show)]
struct ConsumeResult<'lt> {
     value: &'lt str,
     startIndex: uint,
     endIndex: uint,
}

struct Scanner<'lt> {
    code: &'lt str,
    char_iterator: Enumerate<Chars<'lt>>,
    isEof: bool,
}

impl<'lt> Scanner<'lt> {
    fn new<'lt>(code: &'lt str) -> Scanner<'lt> {
        Scanner{code: code, char_iterator: code.chars().enumerate(), isEof: false}
    }

    fn assert_not_eof<'lt>(&'lt self) {
        if self.isEof {fail!("Scanner is at EOF."); }
    }

    fn next(&mut self) -> Option<(uint, char)> {
        self.assert_not_eof();
        let result = self.char_iterator.next();
        if result == None { self.isEof = true; }
        return result;
    }

    fn consume_till<'lt>(&'lt mut self, quit: |char| -> bool) -> ConsumeResult<'lt> {
        self.assert_not_eof();
        let mut startIndex: Option<uint> = None;
        let mut endIndex: Option<uint> = None;

        loop {
            let should_quit = match self.next() {
                None => {
                    endIndex = Some(endIndex.unwrap() + 1);
                    true
                },
                Some((i, ch)) => {
                    if startIndex == None { startIndex = Some(i);}
                    endIndex = Some(i);
                    quit (ch)
                }
            };

            if should_quit {
                return ConsumeResult{ value: self.code.slice(startIndex.unwrap(), endIndex.unwrap()),
                                      startIndex:startIndex.unwrap(), endIndex: endIndex.unwrap() };
            }
        }
    }
}

fn main() {
    let codeToScan = "40 + 2";
    let mut scanner = Scanner::new(codeToScan);
    let first_token = scanner.consume_till(|c| { ! c.is_digit ()});
    println!("first token is: {}", first_token);
    // scanner.consume_till(|c| { c.is_whitespace ()}); // WHY DOES THIS LINE FAIL?
}

回答1:

Here's a simpler example of the same thing:

struct Scanner<'a> {
    s: &'a str
}

impl<'a> Scanner<'a> {
    fn step_by_3_bytes<'a>(&'a mut self) -> &'a str {
        let return_value = self.s.slice_to(3);
        self.s = self.s.slice_from(3);
        return_value
    }
}

fn main() {
    let mut scan = Scanner { s: "123456" };

    let a = scan.step_by_3_bytes();
    println!("{}", a);

    let b = scan.step_by_3_bytes();
    println!("{}", b);
}

If you compile that, you get errors like the code in the question:

<anon>:19:13: 19:17 error: cannot borrow `scan` as mutable more than once at a time
<anon>:19     let b = scan.step_by_3_bytes();
                      ^~~~
<anon>:16:13: 16:17 note: previous borrow of `scan` occurs here; the mutable borrow prevents subsequent moves, borrows, or modification of `scan` until the borrow ends
<anon>:16     let a = scan.step_by_3_bytes();
                      ^~~~
<anon>:21:2: 21:2 note: previous borrow ends here
<anon>:13 fn main() {
...
<anon>:21 }
          ^

Now, the first thing to do is to avoid shadowing lifetimes: that is, this code has two lifetimes called 'a and all the 'as in step_by_3_bytes refer to the 'a declare there, none of them actually refer to the 'a in Scanner<'a>. I'll rename the inner one to make it crystal clear what is going on

impl<'a> Scanner<'a> {
    fn step_by_3_bytes<'b>(&'b mut self) -> &'b str {

The problem here is the 'b is connecting the self object with the str return value. The compiler has to assume that calling step_by_3_bytes can make arbitrary modifications, including invalidating previous return values, when looking at the definition of step_by_3_bytes from the outside (which is how the compiler works, type checking is purely based on type signatures of things that are called, no introspect). That is, it could be defined like

struct Scanner<'a> {
    s: &'a str,
    other: String,
    count: uint
}

impl<'a> Scanner<'a> {
    fn step_by_3_bytes<'b>(&'b mut self) -> &'b str {
        self.other.push_str(self.s);
        // return a reference into data we own
        self.other.as_slice()
    }
}

Now, each call to step_by_3_bytes starts modifying the object that previous return values came from. E.g. it could cause the String to reallocate and thus move in memory, leaving any other &str return values as dangling pointers. Rust protects against this by tracking these references and disallowing mutation if it could cause such catastrophic events. Going back to our actual code: the compiler is type checking main just by looking at the type signature of step_by_3_bytes/consume_till and so it can only assume the worst case scenario (i.e. the example I just gave).


How do we solve this?

Let's take a step back: as if we're just starting out and don't know which lifetimes we want for the return values, so we'll just leave them anonymous (not actually valid Rust):

impl<'a> Scanner<'a> {
    fn step_by_3_bytes<'b>(&'_ mut self) -> &'_ str {

Now, we get to ask the fun question: which lifetimes do we want where?

It's almost always best to annotate the longest valid lifetimes, and we know our return value lives for 'a (since it comes straight of the s field, and that &str is valid for 'a). That is,

impl<'a> Scanner<'a> {
    fn step_by_3_bytes<'b>(&'_ mut self) -> &'a str {

For the other '_, we don't actually care: as API designers, we don't have any particular desire or need to connect the self borrow with any other references (unlike the return value, where we wanted/needed to express which memory it came from). So, we might as well leave it off

impl<'a> Scanner<'a> {
    fn step_by_3_bytes<'b>(&mut self) -> &'a str {

The 'b is unused, so it can be killed, leaving us with

impl<'a> Scanner<'a> {
    fn step_by_3_bytes(&mut self) -> &'a str {

This expresses that Scanner is referring to some memory that is valid for at least 'a, and then returning references into just that memory. The self object is essentially just a proxy for manipulating those views: once you have the reference it returns, you can discard the Scanner (or call more methods).

In summary, the full, working code is

struct Scanner<'a> {
    s: &'a str
}

impl<'a> Scanner<'a> {
    fn step_by_3_bytes(&mut self) -> &'a str {
        let return_value = self.s.slice_to(3);
        self.s = self.s.slice_from(3);
        return_value
    }
}

fn main() {
    let mut scan = Scanner { s: "123456" };

    let a = scan.step_by_3_bytes();
    println!("{}", a);

    let b = scan.step_by_3_bytes();
    println!("{}", b);
}

Applying this change to your code is simply adjusting the definition of consume_till.

fn consume_till(&mut self, quit: |char| -> bool) -> ConsumeResult<'lt> {

So why does the Rust std library code compile, but mine doesn't? (I'm sure the lifetime annotations are at the root of it, but my understanding of lifetimes doesn't lead to me expecting a problem).

There's a slight (but not huge) difference here: Chars is just returning a char, i.e. no lifetimes in the return value. The next method (essentially) has signature:

impl<'a> Chars<'a> {
    fn next(&mut self) -> Option<char> {

(It's actually in an Iterator trait impl, but that's not important.)

The situation you have here is similar to writing

impl<'a> Chars<'a> {
    fn next(&'a mut self) -> Option<char> {

(Similar in terms of "incorrect linking of lifetimes", the details differ.)



回答2:

Let’s look at consume_till.

It takes &'lt mut self and returns ConsumeResult<'lt>. This means that the lifetime 'lt, the duration of the borrow of the input parameter self, will be that of the output parameter, the return value.

Expressed another way, after calling consume_till, you cannot use self again until its result is out of scope.

That result is placed into first_token, and first_token is still in scope in your last line.

In order to get around this, you must cause first_token to pass out of scope; the insertion of a new block around it will do this:

fn main() {
    let code_to_scan = "40 + 2";
    let mut scanner = Scanner::new(code_to_scan);
    {
        let first_token = scanner.consume_till(|c| !c.is_digit());
        println!("first token is: {}", first_token);
    }
    scanner.consume_till(|c| c.is_whitespace());
}

All this does stand to reason: while you have a reference to something inside the Scanner, it is not safe to let you modify it, lest that reference be invalidated. This is the memory safety that Rust provides.



标签: rust