What is the correct type for returning a C99 `bool

2019-04-21 06:15发布

问题:

A colleague and I have been scratching our heads over how to return a bool from <stdbool.h> (a.k.a. _Bool) back to Rust via the FFI.

We have our C99 code we want to use from Rust:

bool
myfunc(void) {
   ...
}

We let Rust know about myfunc using an extern C block:

extern "C" {
    fn myfunc() -> T;
}

What concrete type should T be?

Rust doesn't have a c_bool in the libc crate, and if you search the internet, you will find various GitHub issues and RFCs where people discuss this, but don't really come to any consensus as to what is both correct and portable:

  • https://github.com/rust-lang/rfcs/issues/1982#issuecomment-297534238
  • https://github.com/rust-lang/rust/issues/14608
  • https://github.com/rust-lang/rfcs/issues/992
  • https://github.com/rust-lang/rust/pull/46156

As far as I can gather:

  • The size of a bool in C99 is undefined other than the fact it must be at least large enough to store true (1) and false (0). In other words, at least one bit long.
  • It could even be one bit wide.
  • Its size might be ABI defined.

This comment suggests that if a C99 bool is passed into a function as a parameter or out of a function as the return value, and the bool is smaller than a C int then it is promoted to the same size as an int. Under this scenario, we can tell Rust T is u32.

All right, but what if (for some reason) a C99 bool is 64 bits wide? Is u32 still safe? Perhaps under this scenario we truncate the 4 most significant bytes, which would be fine, since the 4 least significant bytes are more than enough to represent true and false.

Is my reasoning correct? Until Rust gets a libc::c_bool, what would you use for T and why is it safe and portable for all possible sizes of a C99 bool (>=1 bit)?

回答1:

As of 2018-02-01, the size of Rust's bool is officially the same as C's _Bool.

This means that bool is the correct type to use in FFI.


The rest of this answer applies to versions of Rust before the official decision was made

Until Rust gets a libc::c_bool, what would you use for T and why is it safe and portable for all possible sizes of a C99 bool (>=1 bit)?

As you've already linked to, the official answer is still "to be determined". That means that the only possibility that is guaranteed to be correct is: nothing.

That's right, as sad as it may be. The only truly safe thing would be to convert your bool to a known, fixed-size integral type, such as u8, for the purpose of FFI. That means you need to marshal it on both sides.


Practically, I'd keep using bool in my FFI code. As people have pointed out, it magically lines up on all the platforms that are in wide use at the moment. If the language decides to make bool FFI compatible, you are good to go. If they decide something else, I'd be highly surprised if they didn't introduce a lint to allow us to catch the errors quickly.

See also:

  • Is bool guaranteed to be 1 byte?


回答2:

After a lot of thought, I'm going to try answering my own question. Please comment if you can find a hole in the following reasoning.

This is not the correct answer -- see the comments below

I think a Rust u8 is always safe for T.

We know that a C99 bool is an integer large enough to store 0 or 1, which means it's free to be an unsigned integer of at least 1-bit, or (if you are feeling weird) a signed integer of at least 2-bits.

Let's break it down by case:

  1. If the C99 bool is 8-bits then a Rust u8 is perfect. Even in the signed case, the top bit will be a zero since representing 0 and 1 never requires a negative power of two.

  2. If the C99 bool is larger than a Rust u8, then by "casting it down" to a 8-bit size, we only ever discard leading zeros. Thus this is safe too.

  3. Now consider the case where the C99 bool is smaller than the Rust u8. When returning a value from a C function, it's not possible to return a value of size less than one byte due to the underlying calling convention. The CC will require return value to be loaded into a register or into a location on the stack. Since the smallest register or memory location is one byte, the return value will need to be extended (with zeros) to at least a one byte sized value (and I believe the same is true of function arguments, which too must adhere to calling convention). If the value is extended to a one-byte value, then it's the same as case 1. If the value is extended to a larger size, then it's the same as case 2.