How does the Rust compiler know `Cell` has interna

Consider the following code (Playground version):

use std::cell::Cell;

struct Foo(u32);

#[derive(Clone, Copy)]
struct FooRef<'a>(&'a Foo);

// the body of these functions don't matter
fn testa<'a>(x: &FooRef<'a>, y: &'a Foo) { x; }
fn testa_mut<'a>(x: &mut FooRef<'a>, y: &'a Foo) { *x = FooRef(y); }
fn testb<'a>(x: &Cell<FooRef<'a>>, y: &'a Foo) { x.set(FooRef(y)); }

fn main() {
    let u1 = Foo(3);
    let u2 = Foo(5);
    let mut a = FooRef(&u1);
    let b = Cell::new(FooRef(&u1));

    // try one of the following 3 statements
    testa(&a, &u2);         // allow move at (1)
    testa_mut(&mut a, &u2); // deny move -- fine!
    testb(&b, &u2);         // deny move -- but how does rustc know?

    u2;                     // (1) move out
    // ... do something with a or b
}

I'm curious how rustc knows that Cell has interior mutability and may hold on to a reference of the other argument.

If I create another data structure from scratch, similar to Cell which also has interior mutability, how do I tell rustc that?

标签： rust mutability

3条回答

ゆ、 Hurt°

2楼-- · 2020-04-05 09:22

The relevant part from the Rust source code is this:

#[lang = "unsafe_cell"]
pub struct UnsafeCell<T: ?Sized> {
    value: T,
}

Specifically, the #[lang = "unsafe_cell"] is what tells the compiler that this particular type maps to its internal notion of "the interior mutability type". This sort of thing is called a "lang item".

You cannot define your own type for this purpose, as you can't have multiple instances of a single lang item. The only way you could was if you completely replaced the standard library with your own code.

0人赞添加讨论(0) 举报

聊天终结者

3楼-- · 2020-04-05 09:28

In testb, you bind the lifetime 'a of your Foo reference to the FooRef argument. This tells the borrow checker that the &u2 must live at least as long as b's reference to it. Note that this reasoning requires no knowledge of the function body.

~~Within the function, the borrow checker can prove that the second argument lives at least as long as the first, due to the lifetime annotation, otherwise the function would fail to compile.~~

Edit: Disregard this; read huon-dbaupp's answer. I'm leaving this so you can read the comments.

0人赞添加讨论(0) 举报

啃猪蹄的小仙女

4楼-- · 2020-04-05 09:41

The reason the code with Cell compiles (ignoring the u2) and mutates is Cell's whole API takes & pointers:

impl<T> Cell<T> where T: Copy {
    fn new(value: T) -> Cell<T> { ... }

    fn get(&self) -> T { ... }

    fn set(&self, value: T) { ... }
}

It is carefully written to allow mutation while shared, i.e. interior mutability. This allows it to expose these mutating methods behind a & pointer. Conventional mutation requires a &mut pointer (with its associated non-aliasing restrictions) because having unique access to a value is the only way to ensure that mutating it will be safe, in general.

So, the way to create types that allow mutation while shared is to ensure that their API for mutation uses & pointers instead of &mut. Generally speaking this should be done by having the type contain pre-written types like Cell, i.e. use them as building blocks.

The reason later use of u2 fails is a longer story...

`UnsafeCell`

At a lower level, mutating a value while it is shared (e.g. has multiple & pointers to it) is undefined behaviour, except for when the value is contained in an UnsafeCell. This is the very lowest level of interior mutability, designed to be used as a building block for building other abstractions.

Types that allow safe interior mutability, like Cell, RefCell (for sequential code), the Atomic*s, Mutex and RwLock (for concurrent code) all use UnsafeCell internally and impose some restrictions around it to ensure that it is safe. For example, the definition of Cell is:

pub struct Cell<T> {
    value: UnsafeCell<T>,
}

Cell ensures that mutations are safe by carefully restricting the API it offers: the T: Copy in the code above is key.

(If you wish to write your own low-level type with interior mutability, you just need to ensure that the things that are mutated while being shared are contained in an UnsafeCell. However, I recommended not doing this: Rust has several existing tools (the ones I mentioned above) for interior mutability that are carefully vetted to be safe and correct within Rust's aliasing and mutation rules; breaking the rules is undefined behaviour and can easily result in miscompiled programs.)

Lifetime Variance

Anyway, the key that makes the compiler understand that the &u2 is borrowed for the cell case is variance of lifetimes. Typically, the compiler will shorten lifetimes when you pass things to functions, which makes things work great, e.g. you can pass a string literal (&'static str) to a function expecting &'a str, because the long 'static lifetime is shortened to 'a. This is happening for testa: the testa(&a, &u2) call is shortening the lifetimes of the references from the longest they could possibly be (the whole of the body of main) to just that function call. The compiler is free to do this because normal references are variant¹ in their lifetimes, i.e. it can vary them.

However, for testa_mut, the &mut FooRef<'a> stops the compiler being able to shorten that lifetime (in technical terms &mut T is "invariant in T"), exactly because something like testa_mut can happen. In this case, the compiler sees the &mut FooRef<'a> and understand that the 'a lifetime can't be shorted at all, and so in the call testa_mut(&mut a, &u2) it has to take the true lifetime of the u2 value (the whole function) and hence causes u2 to be borrowed for that region.

So, coming back to interior mutability: UnsafeCell<T> not only tells the compiler that a thing may be mutated while aliased (and hence inhibits some optimisations that would be undefined), it is also invariant in T, i.e. it acts like a &mut T for the purposes of this lifetime/borrowing analysis, exactly because it allows code like testb.

The compiler infers this variance automatically; it becomes invariant when some type parameter/lifetime is contained in UnsafeCell or &mut somewhere in the type (like FooRef in Cell<FooRef<'a>>).

The Rustonomicon talks about this and other detailed considerations like it.

¹ Strictly speaking, there's four levels of variance in type system jargon: bivariance, covariance, contravariance and invariance. I believe Rust really only has invariance and covariance (there is some contravariance, but it caused problems and is removed/in the process of being removed). When I say "variant" it really means "covariant". See the Rustonomicon link above for more detail.

0人赞添加讨论(0) 举报