How to type-pun in C

2019-08-23 21:27发布

问题:

Follow-up to extended discussion in Casting behavior in C

I'm trying to emulate a Z80 in C, where several 8-bit registers can be combined to create 16-bit registers.

This is the logic I'm trying to use:

struct {
    uint8_t b;
    uint8_t c;
    uint16_t *bc;
} regs[1];
...
regs->bc = (uint16_t *)&(regs->b);

Why is this incorrect, and how can I do it correctly (using type-punning if needed)?

I need to do this multiple times, preferably within the same structure.

For those of you that I haven't mentioned this to: I understand that this assumes a little-endian architecture. I have this handled completely.

回答1:

It's incorrect because b is of type uint8_t and a pointer to uint16_t cannot be used for accessing such a variable. It might not be correctly aligned and it is a strict aliasing violation.

You are however free to do (uint8_t *)&regs or (struct reg_t*)&regs->b, since (6.7.2.1/15)

A pointer to a structure object, suitably converted, points to its initial member and vice versa.


When doing hardware-related programming, make sure to never use signed types. That means changing intn_t to uintn_t.

As for how to type pun properly, use a union:

typedef union
{
  struct                 /* standard C anonymous struct */
  {
    uint8_t b;
    uint8_t c;
  };
  uint16_t bc;
} reg_t;

You can then assign this to point at a 16 bit hardware register like this:

volatile reg_t* reg = (volatile reg_t*)0x1234;

where 0x1234 is the hardware register address.

NOTE: this union is endianess-dependent. b will access the MS byte of bc on big endian systems, but the LS byte of bc on little endian systems.



回答2:

To emulate a hardware register that can be accessed as two eight-bit registers or one 16-bit register, you can use:

union
{
    struct { int8_t b, c; };
    int16_t bc;
} regs[1];

Then regs->bc will be the 16-bit register, and regs->b and regs->c will be 8-bit registers.

Note: This uses an anonymous struct so that b and c appears as if they were members of the union. If the struct had a name, like this:

union
{
    struct { int8_t b, c; } s;
    int16_t bc;
} regs[1];

then you would have to include its name when accessing b or c, as with regs->s.b. However, C has a feature that allows you to use a declaration without a name for this purpose.

Also note this requires a C compiler. C allows using unions to reinterpret data. C++ has different rules.



回答3:

The correct way is through anonymous unions in C as already shown in other answers. But as you want to process bytes, you may use the special handling of characters in the strict aliasing rule: whatever the type, is is always legal to use a char pointer to access the bytes of its representation. So this is conformant C

struct {
    uint16_t bc;
    uint8_t *b;
    uint8_t *c;
} regs[1];

regs->b = (uint8_t *) &(regs->bc);
regs->c = regs->b + 1

Interestingly enough, it is still valid for a C++ compiler...