I'm getting a Rust compile error from the borrow checker, and I don't understand why. There's probably something about lifetimes I don't fully understand.
I've boiled it down to a short code sample. In main, I want to do this:
fn main() {
let codeToScan = "40 + 2";
let mut scanner = Scanner::new(codeToScan);
let first_token = scanner.consume_till(|c| { ! c.is_digit ()});
println!("first token is: {}", first_token);
// scanner.consume_till(|c| { c.is_whitespace ()}); // WHY DOES THIS LINE FAIL?
}
Trying to call scanner.consume_till
a second time gives me this error:
example.rs:64:5: 64:12 error: cannot borrow `scanner` as mutable more than once at a time
example.rs:64 scanner.consume_till(|c| { c.is_whitespace ()}); // WHY DOES THIS LINE FAIL?
^~~~~~~
example.rs:62:23: 62:30 note: previous borrow of `scanner` occurs here; the mutable borrow prevents subsequent moves, borrows, or modification of `scanner` until the borrow ends
example.rs:62 let first_token = scanner.consume_till(|c| { ! c.is_digit ()});
^~~~~~~
example.rs:65:2: 65:2 note: previous borrow ends here
example.rs:59 fn main() {
...
example.rs:65 }
Basically, I've made something like my own iterator, and the equivalent to the "next" method takes &mut self
. Because of that, I can't use the method more than once in the same scope.
However, the Rust std library has an iterator which can be used more than once in the same scope, and it also takes a &mut self
parameter.
let test = "this is a string";
let mut iterator = test.chars();
iterator.next();
iterator.next(); // This is PERFECTLY LEGAL
So why does the Rust std library code compile, but mine doesn't? (I'm sure the lifetime annotations are at the root of it, but my understanding of lifetimes doesn't lead to me expecting a problem).
Here's my full code (only 60 lines, shortened for this question):
use std::str::{Chars};
use std::iter::{Enumerate};
#[deriving(Show)]
struct ConsumeResult<'lt> {
value: &'lt str,
startIndex: uint,
endIndex: uint,
}
struct Scanner<'lt> {
code: &'lt str,
char_iterator: Enumerate<Chars<'lt>>,
isEof: bool,
}
impl<'lt> Scanner<'lt> {
fn new<'lt>(code: &'lt str) -> Scanner<'lt> {
Scanner{code: code, char_iterator: code.chars().enumerate(), isEof: false}
}
fn assert_not_eof<'lt>(&'lt self) {
if self.isEof {fail!("Scanner is at EOF."); }
}
fn next(&mut self) -> Option<(uint, char)> {
self.assert_not_eof();
let result = self.char_iterator.next();
if result == None { self.isEof = true; }
return result;
}
fn consume_till<'lt>(&'lt mut self, quit: |char| -> bool) -> ConsumeResult<'lt> {
self.assert_not_eof();
let mut startIndex: Option<uint> = None;
let mut endIndex: Option<uint> = None;
loop {
let should_quit = match self.next() {
None => {
endIndex = Some(endIndex.unwrap() + 1);
true
},
Some((i, ch)) => {
if startIndex == None { startIndex = Some(i);}
endIndex = Some(i);
quit (ch)
}
};
if should_quit {
return ConsumeResult{ value: self.code.slice(startIndex.unwrap(), endIndex.unwrap()),
startIndex:startIndex.unwrap(), endIndex: endIndex.unwrap() };
}
}
}
}
fn main() {
let codeToScan = "40 + 2";
let mut scanner = Scanner::new(codeToScan);
let first_token = scanner.consume_till(|c| { ! c.is_digit ()});
println!("first token is: {}", first_token);
// scanner.consume_till(|c| { c.is_whitespace ()}); // WHY DOES THIS LINE FAIL?
}
Let’s look at
consume_till
.It takes
&'lt mut self
and returnsConsumeResult<'lt>
. This means that the lifetime'lt
, the duration of the borrow of the input parameterself
, will be that of the output parameter, the return value.Expressed another way, after calling
consume_till
, you cannot useself
again until its result is out of scope.That result is placed into
first_token
, andfirst_token
is still in scope in your last line.In order to get around this, you must cause
first_token
to pass out of scope; the insertion of a new block around it will do this:All this does stand to reason: while you have a reference to something inside the
Scanner
, it is not safe to let you modify it, lest that reference be invalidated. This is the memory safety that Rust provides.Here's a simpler example of the same thing:
If you compile that, you get errors like the code in the question:
Now, the first thing to do is to avoid shadowing lifetimes: that is, this code has two lifetimes called
'a
and all the'a
s instep_by_3_bytes
refer to the'a
declare there, none of them actually refer to the'a
inScanner<'a>
. I'll rename the inner one to make it crystal clear what is going onThe problem here is the
'b
is connecting theself
object with thestr
return value. The compiler has to assume that callingstep_by_3_bytes
can make arbitrary modifications, including invalidating previous return values, when looking at the definition ofstep_by_3_bytes
from the outside (which is how the compiler works, type checking is purely based on type signatures of things that are called, no introspect). That is, it could be defined likeNow, each call to
step_by_3_bytes
starts modifying the object that previous return values came from. E.g. it could cause theString
to reallocate and thus move in memory, leaving any other&str
return values as dangling pointers. Rust protects against this by tracking these references and disallowing mutation if it could cause such catastrophic events. Going back to our actual code: the compiler is type checkingmain
just by looking at the type signature ofstep_by_3_bytes
/consume_till
and so it can only assume the worst case scenario (i.e. the example I just gave).How do we solve this?
Let's take a step back: as if we're just starting out and don't know which lifetimes we want for the return values, so we'll just leave them anonymous (not actually valid Rust):
Now, we get to ask the fun question: which lifetimes do we want where?
It's almost always best to annotate the longest valid lifetimes, and we know our return value lives for
'a
(since it comes straight of thes
field, and that&str
is valid for'a
). That is,For the other
'_
, we don't actually care: as API designers, we don't have any particular desire or need to connect theself
borrow with any other references (unlike the return value, where we wanted/needed to express which memory it came from). So, we might as well leave it offThe
'b
is unused, so it can be killed, leaving us withThis expresses that
Scanner
is referring to some memory that is valid for at least'a
, and then returning references into just that memory. Theself
object is essentially just a proxy for manipulating those views: once you have the reference it returns, you can discard theScanner
(or call more methods).In summary, the full, working code is
Applying this change to your code is simply adjusting the definition of
consume_till
.There's a slight (but not huge) difference here:
Chars
is just returning achar
, i.e. no lifetimes in the return value. Thenext
method (essentially) has signature:(It's actually in an
Iterator
traitimpl
, but that's not important.)The situation you have here is similar to writing
(Similar in terms of "incorrect linking of lifetimes", the details differ.)