Follow-up to extended discussion in Casting behavior in C
I'm trying to emulate a Z80 in C, where several 8-bit registers can be combined to create 16-bit registers.
This is the logic I'm trying to use:
struct {
uint8_t b;
uint8_t c;
uint16_t *bc;
} regs[1];
...
regs->bc = (uint16_t *)&(regs->b);
Why is this incorrect, and how can I do it correctly (using type-punning if needed)?
I need to do this multiple times, preferably within the same structure.
For those of you that I haven't mentioned this to: I understand that this assumes a little-endian architecture. I have this handled completely.
It's incorrect because b
is of type uint8_t
and a pointer to uint16_t
cannot be used for accessing such a variable. It might not be correctly aligned and it is a strict aliasing violation.
You are however free to do (uint8_t *)®s
or (struct reg_t*)®s->b
, since (6.7.2.1/15)
A pointer to a structure object, suitably converted, points to its initial member and vice versa.
When doing hardware-related programming, make sure to never use signed types. That means changing intn_t
to uintn_t
.
As for how to type pun properly, use a union:
typedef union
{
struct /* standard C anonymous struct */
{
uint8_t b;
uint8_t c;
};
uint16_t bc;
} reg_t;
You can then assign this to point at a 16 bit hardware register like this:
volatile reg_t* reg = (volatile reg_t*)0x1234;
where 0x1234
is the hardware register address.
NOTE: this union is endianess-dependent. b
will access the MS byte of bc
on big endian systems, but the LS byte of bc
on little endian systems.
To emulate a hardware register that can be accessed as two eight-bit registers or one 16-bit register, you can use:
union
{
struct { int8_t b, c; };
int16_t bc;
} regs[1];
Then regs->bc
will be the 16-bit register, and regs->b
and regs->c
will be 8-bit registers.
Note: This uses an anonymous struct
so that b
and c
appears as if they were members of the union. If the struct
had a name, like this:
union
{
struct { int8_t b, c; } s;
int16_t bc;
} regs[1];
then you would have to include its name when accessing b
or c
, as with regs->s.b
. However, C has a feature that allows you to use a declaration without a name for this purpose.
Also note this requires a C compiler. C allows using unions to reinterpret data. C++ has different rules.
The correct way is through anonymous unions in C as already shown in other answers. But as you want to process bytes, you may use the special handling of characters in the strict aliasing rule: whatever the type, is is always legal to use a char pointer to access the bytes of its representation. So this is conformant C
struct {
uint16_t bc;
uint8_t *b;
uint8_t *c;
} regs[1];
regs->b = (uint8_t *) &(regs->bc);
regs->c = regs->b + 1
Interestingly enough, it is still valid for a C++ compiler...