Compile-time generic type size check

I'm attempting to write Rust bindings for a C collection library (Judy Arrays [1]) which only provides itself room to store a pointer-width value. My company has a fair amount of existing code which uses this space to directly store non-pointer values such as pointer-width integers and small structs. I'd like my Rust bindings to allow type-safe access to such collections using generics, but am having trouble getting the pointer-stashing semantics working correctly.

I have a basic interface working using std::mem::transmute_copy() to store the value, but that function explicitly does nothing to ensure the source and destination types are the same size. I'm able to verify that collection type parameter is of a compatible size at run-time via an assertion, but I'd really like the check to somehow be at compile-time.

Example code:

pub struct Example<T> {
    v: usize,
    t: PhantomData<T>,
}

impl<T> Example<T> {
    pub fn new() -> Example<T> {
        assert!(mem::size_of::<usize>() == mem::size_of::<T>());
        Example { v: 0, t: PhantomData }
    }

    pub fn insert(&mut self, val: T) {
        unsafe {
            self.v = mem::transmute_copy(&val);
            mem::forget(val);
        }
    }
}

Is there a better way to do this, or is this run-time check the best Rust 1.0 supports?

(Related question, explaining why I'm not using mem::transmute().)

[1] I'm aware of the existing rust-judy project, but it doesn't support the pointer-stashing I want, and I'm writing these new bindings largely as a learning exercise anyway.

标签： pointers rust ffi

2条回答

【Aperson】

2楼-- · 2019-02-11 21:12

Contrary to the accepted answer, you can check at compile-time!

The trick is to insert, when compiling with optimizations, a call to an undefined C function in the dead-code path. You will get a linker error if your assertion would fail.

0人赞添加讨论(0) 举报

倾城　Initia

3楼-- · 2019-02-11 21:18

Compile-time check?

Is there a better way to do this, or is this run-time check the best Rust 1.0 supports?

In general, there are some hacky solutions to do some kind of compile time testing of arbitrary conditions. For example, there is the static_assertions crate which offers some useful macros (including one macro similar to C++'s static_assert). However, this is hacky and very limited.

In your particular situation, I haven't found a way to perform the check at compile time. The root problem here is that you can't use mem::size_of or mem::transmute on a generic type. Related issues: #43408 and #47966. For this reason, the static_assertions crate doesn't work either.

If you think about it, this would also allow a kind of error very unfamiliar to Rust programmers: an error when instantiating a generic function with a specific type. This is well known to C++ programmers -- Rust's trait bounds are used to fix those often very bad and unhelpful error messages. In the Rust world, one would need to specify your requirement as trait bound: something like where size_of::<T> == size_of::<usize>().

However, this is currently not possible. There once was a fairly famous "const-dependent type system" RFC which would allow these kinds of bounds, but got rejected for now. Support for these kinds of features are slowly but steadily progressing. "Miri" was merged into the compiler some time ago, allowing much more powerful constant evaluation. This is an enabler for many things, including the "Const Generics" RFC, which was actually merged. It is not yet implemented, but it is expected to land in 2018 or 2019.

Unfortunately, it still doesn't enable the kind of bound you need. Comparing two const expressions for equality, was purposefully left out of the main RFC to be resolved in a future RFC.

So it is to be expected that a bound similar to where size_of::<T> == size_of::<usize>() will eventually be possible. But this shouldn't be expected in the near future!

Workaround

In your situation, I would probably introduce an unsafe trait AsBigAsUsize. To implement it, you could write a macro impl_as_big_as_usize which performs a size check and implements the trait. Maybe something like this:

unsafe trait AsBigAsUsize: Sized {
    const _DUMMY: [(); 0];
}

macro_rules! impl_as_big_as_usize {
    ($type:ty) => {
        unsafe impl AsBigAsUsize for $type {
            const _DUMMY: [(); 0] = 
                [(); (mem::size_of::<$type>() == mem::size_of::<usize>()) as usize];
            // We should probably also check the alignment!
        }
    }
}

This uses basically the same trickery as static_assertions is using. This works, because we never use size_of on a generic type, but only on concrete types of the macro invocation.

So... this is obviously far from perfect. The user of your library has to invoke impl_as_big_as_usize once for every type they want to use in your data structure. But at least it's safe: as long as programmers only use the macro to impl the trait, the trait is in fact only implemented for types that have the same size as usize. Also, the error "trait bound AsBigAsUsize is not satisfied" is very understandable.

What about the run-time check?

As bluss said in the comments, in your assert! code, there is no run-time check, because the optimizer constant-folds the check. Let's test that statement with this code:

#![feature(asm)]

fn main() {
    foo(3u64);
    foo(true);
}

#[inline(never)]
fn foo<T>(t: T) {
    use std::mem::size_of;

    unsafe { asm!("" : : "r"(&t)) }; // black box
    assert!(size_of::<usize>() == size_of::<T>());
    unsafe { asm!("" : : "r"(&t)) }; // black box
}

The crazy asm!() expressions serve two purposes:

“hiding” t from LLVM, such that LLVM can't perform optimizations we don't want (like removing the whole function)
marking specific spots in the resulting ASM code we'll be looking at

Compile it with a nightly compiler (in a 64 bit environment!):

rustc -O --emit=asm test.rs

As usual, the resulting assembly code is hard to read; here are the important spots (with some cleanup):

_ZN4test4main17he67e990f1745b02cE:  # main()
    subq    $40, %rsp
    callq   _ZN4test3foo17hc593d7aa7187abe3E
    callq   _ZN4test3foo17h40b6a7d0419c9482E
    ud2

_ZN4test3foo17h40b6a7d0419c9482E: # foo<bool>()
    subq    $40, %rsp
    movb    $1, 39(%rsp)
    leaq    39(%rsp), %rax
    #APP
    #NO_APP
    callq   _ZN3std9panicking11begin_panic17h0914615a412ba184E
    ud2

_ZN4test3foo17hc593d7aa7187abe3E: # foo<u64>()
    pushq   %rax
    movq    $3, (%rsp)
    leaq    (%rsp), %rax
    #APP
    #NO_APP
    #APP
    #NO_APP
    popq    %rax
    retq

The #APP-#NO_APP pair is our asm!() expression.

The foo<bool> case: you can see that our first asm!() instruction is compiled, then an unconditioned call to panic!() is made and afterwards comes nothing (ud2 just says “the program can never reach this spot, panic!() diverges”).
The foo<u64> case: you can see both #APP-#NO_APP pairs (both asm!() expressions) without anything in between.

So yes: the compiler removes the check completely.

It would be way better if the compiler would just refuse to compile the code. But this way we at least know, that there's no run-time overhead.

0人赞添加讨论(0) 举报