For shared references and mutable references the semantics are clear: as long as you have a shared reference to a value, nothing else must have mutable access, and a mutable reference can't be shared.
So this code:
#[no_mangle]
pub extern fn run_ref(a: &i32, b: &mut i32) -> (i32, i32) {
let x = *a;
*b = 1;
let y = *a;
(x, y)
}
compiles (on x86_64) to:
run_ref:
movl (%rdi), %ecx
movl $1, (%rsi)
movq %rcx, %rax
shlq $32, %rax
orq %rcx, %rax
retq
Note that the memory a
points to is only read once, because the
compiler knows the write to b
must not have modified the memory at
a
.
Raw pointer are more complicated. Raw pointer arithmetic and casts are "safe", but dereferencing them is not.
We can convert raw pointers back to shared and mutable references, and then use them; this will certainly imply the usual reference semantics, and the compiler can optimize accordingly.
But what are the semantics if we use raw pointers directly?
#[no_mangle]
pub unsafe extern fn run_ptr_direct(a: *const i32, b: *mut f32) -> (i32, i32) {
let x = *a;
*b = 1.0;
let y = *a;
(x, y)
}
compiles to:
run_ptr_direct:
movl (%rdi), %ecx
movl $1065353216, (%rsi)
movl (%rdi), %eax
shlq $32, %rax
orq %rcx, %rax
retq
Although we write a value of different type, the second read still goes
to memory - it seems to be allowed to call this function with the same
(or overlapping) memory location for both arguments. In other words, a
const
raw pointer does not forbid a coexisting mut
raw pointer; and
its probably fine to have two mut
raw pointers (of possibly different
types) to the same (or overlapping) memory location too.
Note that a normal optimizing C/C++-compiler would eliminate the second read (due to the "strict aliasing" rule: modfying/reading the same memory location through pointers of different ("incompatible") types is UB in most cases):
struct tuple { int x; int y; };
extern "C" tuple run_ptr(int const* a, float* b) {
int const x = *a;
*b = 1.0;
int const y = *a;
return tuple{x, y};
}
compiles to:
run_ptr:
movl (%rdi), %eax
movl $0x3f800000, (%rsi)
movq %rax, %rdx
salq $32, %rdx
orq %rdx, %rax
ret
Playground with Rust code examples
godbolt Compiler Explorer with C example
So: What are the semantics if we use raw pointers directly: is it ok for referenced data to overlap?
This should have direct implications on whether the compiler is allowed to reorder memory access through raw pointers.
No awkward strict-aliasing here
C++ strict-aliasing is a patch on a wooden leg. C++ does not have any aliasing information, and the absence of aliasing information prevents a number of optimizations (as you noted here), therefore to regain some performance strict-aliasing was patched on...
Unfortunately, strict-aliasing is awkward in a systems language, because reinterpreting raw-memory is the essence of what systems language are designed to do.
And doubly unfortunately it does not enable that many optimizations. For example, copying from one array to another must assume that the arrays may overlap.
restrict
(from C) is a bit more helpful, although it only applies to one level at a time.Instead, we have scope-based aliasing analysis
The essence of the aliasing analysis in Rust is based on lexical scopes (barring threads).
The beginner level explanation that you probably know is:
&T
, then there is no&mut T
to the same instance,&mut T
, then there is no&T
or&mut T
to the same instance.As suited to a beginner, it is a slightly abbreviated version. For example:
is perfectly fine, even though both a
&mut i32
(mut_ref
) and a&i32
(x
) point to the same instance!If you try to access
mut_ref
after formingx
, however, the truth is unveiled:So, it is fine to have both
&mut T
and&T
pointing to the same memory location at the same time; however mutating through the&mut T
will be disabled for as long as the&T
exists.In a sense, the
&mut T
is temporarily downgraded to a&T
.So, what of pointers?
First of all, let's review the reference:
Conspicuously absent is any rule forbidding from casting a
*const T
to a*mut T
. That's normal, it's allowed, and therefore the last point is really more of a lint, since it can be so easily worked around.Nomicon
A discussion of unsafe Rust would not be complete without pointing to the Nomicon.
Essentially, the rules of unsafe Rust are rather simple: uphold whatever guarantee the compiler would have if it was safe Rust.
This is not as helpful as it could be, since those rules are not set in stone yet; sorry.
Then, what are the semantics for dereferencing raw pointers?
As far as I know1:
&T
or&mut T
) then you must ensure that the aliasing rules these references obey are upheld,That is, providing that the caller had mutable access to the location:
should be valid, because
*a
has typei32
, so there is no overlap of lifetime in references.However, I would expect:
To be undefined behavior, because
x
would be live while*b
is used to modify its memory.Note how subtle the change is. It's easy to break invariants in
unsafe
code.1 And I might be wrong right now, or I may become wrong in the future