What portability issues are associated with byte-l

2019-04-14 15:36发布

问题:

Purpose

I am writing a small library for a larger project which supplies malloc/realloc/free wrapper-functions as well as a function which can tell you whether or not its parameter (of type void *) corresponds to live (not yet freed) memory allocated and managed by the library's wrapper-functions. Let's refer to this function as isgood_memory.

Internally, the library maintains a hash-table to ensure that the search performed by isgood_memory is reasonably fast. The hash-table maintains pointer values (elements of type void *) to make the search possible. Clearly, values are added and removed from the hash-table to keep it up-to-date with what has been allocated and what has been freed, respectively.

The portability of the library is my biggest concern. It has been designed to assume only a mostly-compliant C90 (ISO/IEC 9899:1990) environment... nothing more.

Question

Since portability is my biggest concern, I couldn't assume that sizeof(void *) == sizeof(X) for the hash-function. Therefore, I have resorted to treating the value byte-by-byte as if it were a string. To accomplish this, the hash function looks a little like:

static size_t hashit(void *ptrval)
{
    size_t i = 0, h = 0;
    union {
        void *ptrval;
        unsigned char string[sizeof(void *)];
    } ptrstr;

    ptrstr.ptrval = ptrval;

    for (; i < sizeof(void *); ++i) {
        size_t byte = ptrstr.string[i];

        /* Crazy operations here... */
    }

    return (h);
}

What portability concerns do any of you have with this particular fragment? Will I encounter any funky alignment issues by accessing ptrval byte-by-byte?

回答1:

You are allowed to access a data type as an array of unsigned char, as you do here. The major portability issue that I see could occur on platforms where the bit-pattern identifying a particular location is not unique - in that case, you might get pointers that compare equal hashing to different locations because the bit patterns were different.

Why could they be different? Well, for one thing, most C data types are allowed to contain padding bits that don't participate in the value. A platform where pointers contained such padding bits could have two pointers that differed only in the padding bits point to the same location. (For example, the OS might use some pointer bits to indicate capabilities of the pointer, not just physical address.) Another example is the far memory model from the early days of DOS, where far pointers consisted of segment:offset, and the adjacent segments overlapped, so that segment:offset could point to the same location as segment+1:offset-x.

All that said, on most platforms in common use today, the bit pattern pointing to a given location is indeed unique. So your code will be widely portable, even though it is unlikely to be strictly conforming.



回答2:

Looks pretty clean. If you can rely on the <inttypes.h> header from C99 (it is often available elsewhere), then consider using uintptr_t - but if you want to hash the value byte-wise, you end up breaking things down to bytes and there is no real advantage to it.



回答3:

Mostly correct. There's one potential problem, though. you assign

size_t byte = ptrstr.string[i];

*string is defined as char, not unsigned char. On the platform that has signed chars and unsigned size_t, it will give you result that you may or may not expect. Just change your char to unsigned char, that will be cleaner.



回答4:

If you don't need the pointer values for some other reason beside keeping track of allocated memory, why not get rid of the hash table altogether and just store a magic number along with the memory allocated as in the example below. The magic number being present alongside the memory allocated indicates that it is still "alive". When freeing the memory you clear the stored magic number before freeing the memory.

#pragma pack(1)
struct sMemHdl
{
   int magic;
   byte firstByte;
};
#pragma pack()

#define MAGIC 0xDEADDEAD
#define MAGIC_SIZE sizeof(((struct sMemHdl *)0)->magic)

void *get_memory( size_t request )
{
   struct sMemHdl *pMemHdl = (struct sMemHdl *)malloc(MAGIC_SIZE + request);
   pMemHdl->magic = MAGIC;
   return (void *)&pMemHdl->firstByte;
}

void free_memory ( void *mem )
{
   if ( isgood_memory(mem) != 0 )
   {
      struct sMemHdl *pMemHdl = (struct sMemHdl *)((byte *)mem - MAGIC_SIZE);
      pMemHdl->magic = 0;
      free(pMemHdl);
   }
}

int isgood_memory ( void *Mem )
{
   struct sMemHdl *pMemHdl = (struct sMemHdl *)((byte *)Mem - MAGIC_SIZE);
   if ( pMemHdl->magic == MAGIC )
   {
      return 1; /* mem is good */
   }
   else
   {
      return 0; /* mem already freed */
   }
}

This may be a bit hackish, but I guess I'm in a hackish mood...



回答5:

Accessing variables such integers or pointers as chars or unsigned chars in not a problem from a portability view. But the reverse is not true, because it is hardware dependent. I have one question, why are you hashing a pointer as a string instead of using the pointer itself as a hash value ( using uintptr_t) ?