How to make two otherwise identical pointer types

2020-07-08 06:57发布

问题:

On certain architectures it may be necessary to have different pointer types for otherwise identical objects. Particularly for a Harvard architecture CPU, you may need something like:

uint8_t const ram* data1;
uint8_t const rom* data2;

Particularly this is how the definition of pointers to ROM / RAM looked like in MPLAB C18 (now discontinued) for PICs. It could define even things like:

char const rom* ram* ram strdptr;

Which means a pointer in RAM to pointers in RAM pointing to strings in ROM (using ram is not necessary as by default things are in RAM by this compiler, just added all for clarity).

The good thing in this syntax is that the compiler is capable to alert you when you try to assign in an incompatible manner, like the address of a ROM location to a pointer to RAM (so something like data1 = data2;, or passing a ROM pointer to a function using a RAM pointer would generate an error).

Contrary to this, in avr-gcc for the AVR-8, there is no such type safety as it rather provides functions to access ROM data. There is no way to distinguish a pointer to RAM from a pointer to ROM.

There are situations where this kind of type safety would be very beneficial to catch programming errors.

Is there some way to add similar modifiers to pointers in some manner (such as by preprocessor, expanding to something which could mimic this behavior) to serve this purpose? Or even something which warns on improper access? (in case of avr-gcc, trying to fetch values without using the ROM access functions)

回答1:

One trick is to wrap the pointers in a struct. Pointers to struct have better type safety than pointers to the primitive data types.

typedef struct
{
  uint8_t ptr;
} a_t;

typedef struct
{
  uint8_t ptr;
} b_t;

const volatile a_t* a = (const volatile a_t*)0x1234;
const volatile b_t* b = (const volatile b_t*)0x5678;

a = b; // compiler error
b = a; // compiler error


回答2:

You could encapsulate the pointer in different struct for RAM and ROM, making the type incompatible, but containing the same type of values.

struct romPtr {
    void *addr;
};

struct ramPtr {
    void *addr;
};

int main(int argc, char **argv) {
    struct romPtr data1 = {NULL};
    struct romPtr data3 = data1;
    struct ramPtr data2 = data1; // <-- gcc would throw a compilation error here
}

During compilation :

$ cc struct_test.c
struct_test.c: In function ‘main’:
struct_test.c:12:24: error: invalid initializer
  struct ramPtr data2 = data1;
                    ^~~~~

You could of course typedefs the struct for brevity



回答3:

Since I received several answers which offer different compromises on providing a solution, I decided to merge them in one, outlining the benefits and drawbacks of each. So you can choose the most appropriate for your particular situation

Named Address Spaces

For the particular problem of solving this, and only this case of ROM and RAM pointers on an AVR-8 micro, the most appropriate solution is this.

This was a proposal for C11 which didn't make it into the final standard, however there are C compilers which support it, including avr-gcc used for 8 bit AVRs.

The related documentation can be accessed here (part of the online GCC manual, also including other architectures using this extension). It is recommendable over other solutions (such as function-like macros in pgmspace.h for the AVR-8) as with this, the compiler can make the appropriate checks, while otherwise accessing the data pointed by remains clear and simple.

In particular, if you have a similar problem of porting something from a compiler which offered some sort of named address spaces, like MPLAB C18, this is likely the fastest and cleanest way to do it.

The ported pointers from above would look like as follows:

uint8_t const* data1;
uint8_t const __flash* data2;
char const __flash** strdptr;

(If possible, one could simplify the process using appropriate preprocessor definitions)

(Original answer by Olaf)

Struct encapsulation, pointer inside

This method aims to strenghten typing of pointers by wrapping them in structures. The intended usage is that you pass the structures themselves across interfaces, by which the compiler can perform type checks on them.

A "pointer" type to byte data could look like this:

typedef struct{
    uint8_t* ptr;
}bytebuffer_ptr;

The pointed data can be accessed as follows:

bytebuffer_ptr bbuf;
(...)
bbuf.ptr = allocate_bbuf();
(...)
bbuf.ptr[index] = value;

A function prototype accepting such a type and returning one could look like as follows:

bytebuffer_ptr encode_buffer(bytebuffer_ptr inbuf, size_t len);

(Original answer by dvhh)

Struct encapsulation, pointer outside

Similar to the method above, it aims to strenghten typing of pointers by wrapping them in structures, but in a different manner, providing a more robust constraint. The data type to be pointed to is which is encapsulated.

A "pointer" type to byte data could look like this:

typedef struct{
    uint8_t val;
}byte_data;

The pointed data can be accessed as follows:

byte_data* bbuf;
(...)
bbuf = allocate_bbuf();
(...)
bbuf[index].val = value;

A function prototype accepting such a type and returning one could look like as follows:

byte_data* encode_buffer(byte_data* inbuf, size_t len);

(Original answer by Lundin)

Which should I use?

Named Address Spaces in this regard don't need much discussion: They are the most appropriate solution if you only want to deal with a pecularity of your target handling address spaces. The compiler will provide you the compile-time checks you need, and you don't have to try to invent anything further.

If, however for other reasons you are interested in structure wrapping, these are matters which you may want to consider:

  • Both methods can be optimized just fine: at least GCC will generate identical code from either to using plain pointers. So you don't really have to consider performance: they should work.

  • Pointer inside is useful if you have either third-party interfaces to serve which demand pointers, or maybe if you are refactoring something so large which you can't do in one pass.

  • Pointer outside provides more robust type safety as you reinforce the pointed type itself with it: you have a true distinct type which you can't easily (accidentally) convert (implicit cast).

  • Pointer outside allows you to use modifiers on the pointer, such as adding const, which is important for creating robust interfaces (you can make data intended to be read only by a function const).

  • Keep in mind that some people might not like either of these, so if you are working in a group, or are creating code which might be reused by known parties, discuss the matter with them first.

  • Should be obvious, but keep in mind that encapsulating doesn't solve the problem of requiring special access code (such as by the pgmspace.h macros on an AVR-8), assuming no Named Address Spaces are used alongside with the method. It only provides a method to produce a compile error if you try to use a pointer by functions operating on a different address space than what it intends to point into.

Thank you for all the answers!



回答4:

True harvard architectures use different instructions to access different types of memory like code (Flash on AVR), data (RAM), hardware peripheral registers (IO) and possibly others. The values of addresses in the ranges typically overlap, i.e. the same value accesses different internal devices, depending on the instruction.

Comming to C, if you want to use a unified pointer, this means you not only have to encode the address (value), but also the access type ("address space" in the following) in the pointer value. This can either be done using additional bits in a pointer's value, but also select the appropriate instruction at run-time for every access. This constitutes a significant overhead to the generated code. Additionally, often there are no spare bits in the "natural" value for at least some address spaces (e.g. all 16 bits of the pointer are used already for the address). So additional bits are required, at least a byte worth. This blows up memory usage (mostly RAM), too.

Both are typically unacceptable on typical MCUs using this architecture, because they are already quite limited. Fortunately, for most applications, it is absolutely unnecessary (or easily avoidable at least) to determine the address space at run-time.

To solve this problem all compilers for such a platform support some way to tell the compiler in which address space and object resides. Standard draft N1275 for the then-upcoming C11 proposed a standard way using "named address spaces". Unfortunately it did not make it into the final version, so we are left with compiler-extensions.

For gcc (see the documentation for other compilers), the developers implemented the original standard proposal. As the address spaces are target-specific, the code is not portable between different archittectures, but that is normally true for bare-metal embedded code anyway, nothing really lost.

Reading the documentation for AVR, an address space is simply used similar to a standard qualifier. The compiler will automatically emit the correct instructions to access the correct space. Also there is a unified address space which determines the area at run-time as explained above.

Address spaces work similar to qualifiers, there are stronger constraints to determine compatibility, i.e. when assigning pointers of different address spaces to each other. For a detailed description, see the proposal, chapter 5.

Conclusion:

named address spaces is what you want. They solve two problems:

  • Ensure pointers to incompatible address spaces can't be assigned to each other unnoticed.
  • Tell the compiler how to access the object, i.e. which instructions to use.

With regard to the other answers proposing structs, you have to specify the address space (and the type for void *) anyway once you acces the data. Usign the address space in the declaration keeps the rest of the code clean and even allows to change it lateron at a single location in the source code.

If you are after portability betwen tool-chains, rread their documentation and use macros. It is most likely you just will have to adopt the actual names of the address spaces.

Sidenote: The PIC18 example you cite actually uses the syntax for named address spaces. Just the names are deprecated, because an implementation should leave all non-standard names free for the application code. Hence the underscore-qualified names in gcc.

Disclaimer: I did not test the features, but relied on the documentation. Helpful feedback in comments appreciated.