Declaring Pascal-style strings in C

2019-02-03 02:27发布

问题:

In C, is there a good way to define length first, Pascal-style strings as constants, so they can be placed in ROM? (I'm working with a small embedded system with a non-GCC ANSI C compiler).

A C-string is 0 terminated, eg. {'f','o','o',0}.

A Pascal-string has the length in the first byte, eg. {3,'f','o','o'}.

I can declare a C-string to be placed in ROM with:

const char *s = "foo";

For a Pascal-string, I could manually specify the length:

const char s[] = {3, 'f', 'o', 'o'};

But, this is awkward. Is there a better way? Perhaps in the preprocessor?

回答1:

I think the following is a good solution, but don't forget to enable packed structs:

#include <stdio.h>

#define DEFINE_PSTRING(var,str) const struct {unsigned char len; char content[sizeof(str)];} (var) = {sizeof(str)-1, (str)}

DEFINE_PSTRING(x, "foo");
/*  Expands to following:
    const struct {unsigned char len; char content[sizeof("foo")];} x = {sizeof("foo")-1, "foo"};
*/

int main(void)
{
    printf("%d %s\n", x.len, x.content);
    return 0;
}

One catch is, it adds an extra NUL byte after your string, but it can be desirable because then you can use it as a normal c string too. You also need to cast it to whatever type your external library is expecting.



回答2:

GCC and clang (and possibly others) accept the -fpascal-strings option which allows you to declare pascal-style string literals by having the first thing that appears in the string be a \p, e.g. "\pfoo". Not exactly portable, but certainly nicer than funky macros or the runtime construction of them.

See here for more info.



回答3:

You can still use a const char * literal and an escape sequence as its first character that indicates the length:

const char *pascal_string = "\x03foo";

It will still be null-terminated, but that probably doesn't matter.



回答4:

My approach would be to create functions for dealing with Pascal strings:

void cstr2pstr(const char *cstr, char *pstr) {
    int i;
    for (i = 0; cstr[i]; i++) {
        pstr[i+1] = cstr[i];
    }
    pstr[0] = i;
}

void pstr2cstr(const char *pstr, char *cstr) {
    int i;
    for (i = 0; i < pstr[0]; i++) {
        cstr[i] = pstr[i+1];
    }
    cstr[i] = 0;
}

Then I could use it this way:

int main(int arg, char *argv[]) {
    char cstr[] = "ABCD", pstr[5], back[5];
    cstr2pstr(cstr, pstr);
    pstr2cstr(pstr, back);
    printf("%s\n", back);
    return 0;
}

This seems to be simple, straightforward, less error prone and not specially awkward. It may be not the solution to your problem, but I would recommend you to at least think about using it.



回答5:

You can apply sizeof to string literals as well. This allows a little less awkward

const char s[] = {sizeof "foo" - 1u, 'f', 'o', 'o'};

Note that the sizeof a string literal includes the terminating NUL character, which is why you have to subtract 1. But still, it's a lot of typing and obfuscated :-)



回答6:

It may sound a little extreme but if you have many strings of this kind that need frequent updating you may consider writing your own small tool (a perl script maybe?) that runs on the host system, parses an input file with a custom format that you can design to your own taste and outputs a .c file. You can integrate it to your makefile or whatever and live happily ever after :)

I'm talking about a program that will convert this input (or another syntax that you prefer):

s = "foo";
x = "My string";

To this output, which is a .c file:

const char s[] = {3, 'f', 'o', 'o'};
const char x[] = {9, 'M', 'y', ' ', 's', 't', 'r', 'i', 'n', 'g'};


回答7:

One option might be to abuse the preprocessor. By declaring a struct of the right size and populating it on initialization, it can be const.

#define DECLARE_PSTR(id,X) \
    struct pstr_##id { char len; char data[sizeof(X)]; }; \
    static const struct pstr_##id id = {sizeof(X)-1, X};

#define GET_PSTR(id) (const char *)&(id)

#pragma pack(push)
#pragma pack(1) 
DECLARE_PSTR(bob, "foo");
#pragma pack(pop)

int main(int argc, char *argv[])
{
    const char *s = GET_PSTR(bob);
    int len;

    len = *s++;
    printf("len=%d\n", len);
    while(len--)
        putchar(*s++);
    return 0;
} 


回答8:

This is why Variable Length Arrays were introduced in c99 (and to avoid the use of the "struct hack") IIRC, Pascal-strings were limited to a maximal length of 255.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h> // For CHAR_BIT

struct pstring {
        unsigned char len;
        char dat[];
        };

struct pstring *pstring_new(char *src, size_t len)
{
struct pstring *this;
if (!len) len = strlen(src);

    /* if the size does not fit in the ->len field: just truncate ... */
if (len >=(1u << (CHAR_BIT * sizeof this->len))) len = (1u << (CHAR_BIT * sizeof this->len))-1;

this = malloc(sizeof *this + len);
if (!this) return NULL;

this->len = len;
memcpy (this->dat, src, len);
return this;
}

int main(void)
{
struct pstring *pp;

pp = pstring_new("Hello, world!", 0);

printf("%p:[%u], %*.*s\n", (void*) pp
        , (unsigned int) pp->len
        , (unsigned int) pp->len
        , (unsigned int) pp->len
        , pp->dat
        );
return 0;
}


回答9:

You can define an array in the way you like, but note that this syntax is not adequate:

const char *s = {3, 'f', 'o', 'o'};

You need an array instead of a pointer:

const char s[] = {3, 'f', 'o', 'o'};

Note that a char will only store numbers up to 255 (considering it's not signed) and this will be your maximum string length.

Don't expect this to work where other strings would, however. A C string is expected to terminate with a null character not only by the compiler, but by everything else.



回答10:

Here's my answer, complete with an append operation that uses alloca() for automatic storage.

#include <stdio.h>
#include <string.h>
#include <alloca.h>

struct pstr {
  unsigned length;
  char *cstr;
};

#define PSTR(x) ((struct pstr){sizeof x - 1, x})

struct pstr pstr_append (struct pstr out,
             const struct pstr a,
             const struct pstr b)
{
  memcpy(out.cstr, a.cstr, a.length); 
  memcpy(out.cstr + a.length, b.cstr, b.length + 1); 
  out.length = a.length + b.length;
  return out;
}

#define PSTR_APPEND(a,b) \
  pstr_append((struct pstr){0, alloca(a.length + b.length + 1)}, a, b)

int main()
{
  struct pstr a = PSTR("Hello, Pascal!");
  struct pstr b = PSTR("I didn't C you there.");

  struct pstr result = PSTR_APPEND(PSTR_APPEND(a, PSTR(" ")), b);

  printf("\"%s\" is %d chars long.\n", result.cstr, result.length);
  return 0;
} 

You could accomplish the same thing using c strings and strlen. Because both alloca and strlen prefer short strings I think that would make more sense.



标签: c string pascal