Using C Preprocessing to get integer value of a st

2020-08-21 07:22发布

问题:

How would I create a C macro to get the integer value of a string? The specific use-case is following on from a question here. I want to change code like this:

enum insn {
    sysenter = (uint64_t)'r' << 56 | (uint64_t)'e' << 48 |
               (uint64_t)'t' << 40 | (uint64_t)'n' << 32 |
               (uint64_t)'e' << 24 | (uint64_t)'s' << 16 |
               (uint64_t)'y' << 8  | (uint64_t)'s',
    mov = (uint64_t)'v' << 16 | (uint64_t)'o' << 8 |
          (uint64_t)'m'
};

To this:

enum insn {
    sysenter = INSN_TO_ENUM("sysenter"),
    mov      = INSN_TO_ENUM("mov")
};

Where INSN_TO_ENUM expands to the same code. The performance would be the same, but the readability would be boosted by a lot.

I'm suspecting that in this form it might not be possible because of a the C preprocessor's inability for string processing, so this would also be an unpreferred but acceptable solution (variable argument macro):

enum insn {
    sysenter = INSN_TO_ENUM('s','y','s','e','n','t','e','r'),
    mov      = INSN_TO_ENUM('m','o','v')
};

回答1:

Here's a compile-time, pure C solution, which you indicated as acceptable. You may need to extend it for longer mnemonics. I'll keep on thinking about the desired one (i.e. INSN_TO_ENUM("sysenter")). Interesting question :)

#include <stdio.h>

#define head(h, t...) h
#define tail(h, t...) t

#define A(n, c...) (((long long) (head(c))) << (n)) | B(n + 8, tail(c))
#define B(n, c...) (((long long) (head(c))) << (n)) | C(n + 8, tail(c))
#define C(n, c...) (((long long) (head(c))) << (n)) | D(n + 8, tail(c))
#define D(n, c...) (((long long) (head(c))) << (n)) | E(n + 8, tail(c))
#define E(n, c...) (((long long) (head(c))) << (n)) | F(n + 8, tail(c))
#define F(n, c...) (((long long) (head(c))) << (n)) | G(n + 8, tail(c))
#define G(n, c...) (((long long) (head(c))) << (n)) | H(n + 8, tail(c))
#define H(n, c...) (((long long) (head(c))) << (n)) /* extend here */

#define INSN_TO_ENUM(c...) A(0, c, 0, 0, 0, 0, 0, 0, 0)

enum insn {
    sysenter = INSN_TO_ENUM('s','y','s','e','n','t','e','r'),
    mov      = INSN_TO_ENUM('m','o','v')
};

int main()
{
    printf("sysenter = %llx\nmov = %x\n", sysenter, mov);
    return 0;
}


回答2:

EDIT: This answer may be helpful so I'm not deleting it, but doesn't specifically answer the question. It DOES convert strings to numbers, but cannot be placed in an enum because it doesn't compute the number at compile-time.

Well, since your integers are 64 bit, you only have the first 8 characters of any string to worry about. Therefore, you can write the thing 8 times, making sure you don't go out of the string bound:

#define GET_NTH_BYTE(x, n)   (sizeof(x) <= n?0:((uint64_t)x[n] << (n*8)))
#define INSN_TO_ENUM(x)      GET_NTH_BYTE(x, 0)\
                            |GET_NTH_BYTE(x, 1)\
                            |GET_NTH_BYTE(x, 2)\
                            |GET_NTH_BYTE(x, 3)\
                            |GET_NTH_BYTE(x, 4)\
                            |GET_NTH_BYTE(x, 5)\
                            |GET_NTH_BYTE(x, 6)\
                            |GET_NTH_BYTE(x, 7)

What it does is basically to check at each byte whether it is in the limit of the string and if it is, then gives you the corresponding byte.

Note: that this only works on literal strings.

If you want to be able to convert any string, you can give the length of the string with it:

#define GET_NTH_BYTE(x, n, l)   (l < n?0:((uint64_t)x[n] << (n*8)))
#define INSN_TO_ENUM(x, l)      GET_NTH_BYTE(x, 0, l)\
                               |GET_NTH_BYTE(x, 1, l)\
                               |GET_NTH_BYTE(x, 2, l)\
                               |GET_NTH_BYTE(x, 3, l)\
                               |GET_NTH_BYTE(x, 4, l)\
                               |GET_NTH_BYTE(x, 5, l)\
                               |GET_NTH_BYTE(x, 6, l)\
                               |GET_NTH_BYTE(x, 7, l)

So for example:

int length = strlen(your_string);
int num = INSN_TO_ENUM(your_string, length);

Finally, there is a way to avoid giving the length, but it requires the compiler actually computing the phrases of INSN_TO_ENUM from left-to-right. I'm not sure if this is standard:

static int _nul_seen;
#define GET_NTH_BYTE(x, n)  ((_nul_seen || x[n] == '\0')?(_nul_seen=1)&0:((uint64_t)x[n] << (n*8)))
#define INSN_TO_ENUM(x)     (_nul_seen=0)|
                              (GET_NTH_BYTE(x, 0)\
                              |GET_NTH_BYTE(x, 1)\
                              |GET_NTH_BYTE(x, 2)\
                              |GET_NTH_BYTE(x, 3)\
                              |GET_NTH_BYTE(x, 4)\
                              |GET_NTH_BYTE(x, 5)\
                              |GET_NTH_BYTE(x, 6)\
                              |GET_NTH_BYTE(x, 7))


回答3:

If you can use C++11 on a recent compiler

constexpr uint64_t insn_to_enum(const char* x) {
    return *x ? *x + (insn_to_enum(x+1) << 8) : 0;
}

enum insn { sysenter = insn_to_enum("sysenter") };

will work and calculate the constant during compile time.



回答4:

Some recursive template magic may do the trick. Creates no code if constants are known at compile time.

May want to keep an eye on your build times if you use it in anger though.

// the main recusrsive template magic. 
template <int N>
struct CharSHift 
{
    static __int64  charShift(char* string )
    {
        return string[N-1] | (CharSHift<N-1>::charShift(string)<<8);
    }
};

// need to provide a specialisation for 0 as this is where we need the recursion to stop
template <>
struct CharSHift<0> 
{
    static __int64 charShift(char* string )
    {
        return 0;
    }
};

// Template stuff is all a bit hairy too look at. So attempt to improve that with some macro wrapping !
#define CT_IFROMS(_string_) CharSHift<sizeof _string_ -1 >::charShift(_string_)

int _tmain(int argc, _TCHAR* argv[])
{
    __int64 hash0 = CT_IFROMS("abcdefgh");

    printf("%08llX \n",hash0);
    return 0;
}