Bitfield and union for low level data structures a

2020-04-07 18:41发布

问题:

I need to manage bitfield data and unions. Here is the code like I think it in C:

typedef struct __attribute__((__packed__)){
    union {
        struct __attribute__((__packed__)){
            unsigned short protocol : 4;
            unsigned short target : 12;
            unsigned short target_mode : 4;
            unsigned short source : 12;
            unsigned char cmd;
            unsigned char size;
        };
        unsigned char unmap[6]; // Unmapped form.
    };
}header_t;

I use this union to switch easily from a mapped to an unmapped form. I can write to header_t.protocol or header_t.source and get it back as an u8 array using header_t.unmap. This switch uses no time and shares the same memory block.

I tried to do the same thing in Rust but I didn't find a clean way to do it. I succeeded in making it using two structures and a dedicated impl to switch between them:

#[allow(dead_code)]
pub struct Header {
    protocol:    u8,  // 4 bits used
    target:      u16, // 12 bits used
    target_mode: u8,  // 4 bits used
    source:      u16, // 12 bits used
    cmd:         u8,  // 8 bits used
    size:        u8,  // 8 bits used
}

#[allow(dead_code)]
pub struct UnmapHeader{
    tab:[u8; 6],
}

impl Header {
    #[allow(dead_code)]
    pub fn unmap(&self) -> UnmapHeader {
        let mut unmap_header = UnmapHeader { tab: [0; 6],};
        unmap_header.tab[0] = (self.protocol & 0b0000_1111) | (self.target << 4) as u8;
        unmap_header.tab[1] = (self.target >> 4) as u8;
        unmap_header.tab[2] = ((self.target_mode as u8) & 0b0000_1111) | (self.source << 4) as u8;
        unmap_header.tab[3] = (self.source >> 4) as u8;
        unmap_header.tab[4] = self.cmd;
        unmap_header.tab[5] = self.size;
        unmap_header
    }
}

impl UnmapHeader {
    #[allow(dead_code)]
    pub fn map(&self) -> Header {
        Header{
        protocol: self.tab[0] & 0b0000_1111,
        target: ((self.tab[0] & 0b1111_0000) >> 4) as u16 & (self.tab[1] << 4) as u16,
        target_mode: self.tab[2] & 0b0000_1111,
        source: ((self.tab[2] & 0b1111_0000) >> 4) as u16 & (self.tab[3] << 4) as u16,
        cmd: self.tab[4],
        size: self.tab[5],
        }
    }
}

#[test]
fn switch() {
    let header = Header {
        protocol: 0b0000_1000,
        target: 0b0000_0100_0000_0001,
        target_mode: 0b0000_0100,
        source: 0b0000_0100_0000_0001,
        cmd: 0xAA,
        size: 10,
    };
    let unmap_header = header.unmap();
    assert_eq!(unmap_header.tab[0], 0b0001_1000);
    assert_eq!(unmap_header.tab[1], 0b0100_0000);
    assert_eq!(unmap_header.tab[2], 0b0001_0100);
    assert_eq!(unmap_header.tab[3], 0b0100_0000);
    assert_eq!(unmap_header.tab[4], 0xAA);
    assert_eq!(unmap_header.tab[5], 10);
}

Is there a more idiomatic Rust solution?

回答1:

Rust (since quite recently) supports C-style unions. However, unions require an unsafe block and are not idiomatic for pure Rust code if you don't have to interact with C unions.

One approach is to model your underlying data just as a [u8; 6] and then provide more friendly accessor functions:

pub struct Header {
    tab: [u8; 6],
}

impl Header {
    pub fn get_protocol(&self) -> u8 {
        self.tab[0] & 0b0000_1111
    }

    pub fn set_protocol(&mut self, value: u8) {
        self.tab[0] = self.tab[0] & 0b1111_0000 | value & 0b0000_1111;
    }

    // etc..
}

As you mentioned in one of the question comments, you can keep the code simpler by using the bitfield crate.

Another approach could be to define your struct with the individual fields, but convert to [u8; 6]. As you have presented it though, the fields take up more space than the [u8; 6], so there isn't a convenient conversion (e.g. unsafe std::mem::transmute) without having to shift around the bits for each individual field anyway. So probably the solution above is better.

Regardless of the underlying representation, defining friendly accessors is probably a good idea in this situation. It's a cost-free abstraction, which will let you change your mind about the representation later without having to change how it is used.