Using C++ Templates with C structs for introspecti

2019-04-03 01:09发布

问题:

I'm doing some work in C++ for a company that has everything else written in C (using C isn't an option for me :( ). They have a number of data structures that are VERY similar (i.e., they all have fields such as "name", "address", etc. But, for whatever reason there isn't a common structure that they used to base everything else off of (makes doing anything hell). Anywho, I need to do a system-wide analysis of these structs that are in memory, and through it all into a table. Not too bad, but the table has to include entries for all the fields of all the variables, even if they don't have the field (struct b may have field "latency", but struct a doesn't - in the table the entry for each instance of a must have an empty entry for "latency".

So, my question is, is there a way to determine at runtime if a structure that has been passed into a template function has a specific field? Or will I have to write some black magic macro that does it for me? (The problem is basically that I can't use template specialization)

Thanks! If you have any questions please feel free to ask!

Here's a snippit of what I was thinking...

struct A
{
  char name[256];
  int index;
  float percision;
};

struct B
{
  int index;
  char name[256];
  int latency;
};

/* More annoying similar structs... note that all of the above are defined in files that were compiled as C - not C++ */

struct Entry
{
  char name[256];
  int index;
  float percision;
  int latency;
  /* more fields that are specific to only 1 or more structure */
};

template<typename T> struct Entry gatherFrom( T *ptr )
{
  Entry entry;

  strcpy( entry.name, ptr->name, strlen( ptr->name ) );
  entry.index = ptr->index;
  /* Something like this perhaps? */
  entry.percision = type_contains_field( "percision" ) ? ptr->percision : -1;
}

int main()
{
  struct A a;
  struct B b;

  /* initialization.. */

  Entry e  = gatherFrom( a );
  Entry e2 = gatherFrom ( b );

  return 0;
}

回答1:

everything else written in C (using C isn't an option for me :( ).

First I'd like to quote what Linus Torvalds had to say about this issue:


From: Linus Torvalds <torvalds <at> linux-foundation.org>
Subject: Re: [RFC] Convert builin-mailinfo.c to use The Better String Library.
Newsgroups: gmane.comp.version-control.git
Date: 2007-09-06 17:50:28 GMT (2 years, 14 weeks, 16 hours and 36 minutes ago)

C++ is a horrible language. It's made more horrible by the fact that a lot 
of substandard programmers use it, to the point where it's much much 
easier to generate total and utter crap with it. Quite frankly, even if 
the choice of C were to do *nothing* but keep the C++ programmers out, 
that in itself would be a huge reason to use C.

http://harmful.cat-v.org/software/c++/linus


They have a number of data structures that are VERY similar (i.e., they all have fields such as "name", "address", etc. But, for whatever reason there isn't a common structure that they used to base everything else off of (makes doing anything hell).

They may have had very sound reasons for this. Putting common fields into a single base structure (class) may sound like a great idea. But it makes things really difficult if you want to apply major changes to one of the structures (replace some fields, change types, etc.) while leaving the rest intact. OOP is certainly not the one true way to do things.

So, my question is, is there a way to determine at runtime if a structure that has been passed into a template function has a specific field?

No this is not possible. Neither in C nor in C++, because all information about types gets discarded when the binary is created. There's neither reflection nor introspection in C or C++. Well, technically the debug information the compiler emits does provide this information, but there's no language builtin feature to access this. Also this sort of debug information relies on analysis performed at compile time, not at runtime. C++ has RTTI, but this is only a very coarse system to identify which class an instance is off. It does not help with class or struct members.

But why do you care to do this at runtime anyway?

Anywho, I need to do a system-wide analysis of these structs that are in memory, and through it all into a table.

You should be actually happy that you have to analyse C and not C++. Because C is really, really easy to parse (unlike C++ which is tremendously difficult to parse, mostly because of those darn templates). Especially structs. I'd just write a small and simple script, that extracts all the struct definitions from the C sources. However since structs are of constant size, they often contain pointers to dynamically allocated data. And unless you want to patch your allocator, I think the most easy way to analyse this, is by hooking into a debugger and record the memory usage of every unique object whose pointer is assigned to a struct member.



回答2:

You can do this at compile-time without touching the source of the original structs:

#include <iostream>
#include <limits>
#include <memory.h>

struct A
{
    char name[256];
    int index;
    float percision;
};

struct B
{
    int index;
    char name[256];
    int latency;
};

struct Entry
{
    char name[256];
    int index;
    float percision;
    int latency;
    /* more fields that are specific to only 1 or more structure */
};

inline
std::ostream & operator<<(std::ostream & os, Entry const & e) {
    return os << e.name << "{" << e.index << ", " << e.percision << ", " << e.latency << "}";
}

template <typename T>
inline
void assign(T & dst, T const & src) {
    dst = src;
}

template <size_t N>
inline
void assign(char (&dst)[N], char const (&src)[N]) {
    memcpy(dst, src, N);
}

#define DEFINE_ENTRY_FIELD_COPIER(field)                            \
    template <typename T>                                           \
    inline                                                          \
    decltype(T::field, true) copy_##field(T const * t, Entry & e) { \
        assign(e.field, t->field);                                  \
        return true;                                                \
    }                                                               \
                                                                    \
    inline                                                          \
    bool copy_##field(void const *, Entry &) {                      \
            return false;                                           \
    }

DEFINE_ENTRY_FIELD_COPIER(name)
DEFINE_ENTRY_FIELD_COPIER(index)
DEFINE_ENTRY_FIELD_COPIER(percision)
DEFINE_ENTRY_FIELD_COPIER(latency)

template <typename T>
Entry gatherFrom(T const & t) {
    Entry e = {"", -1, std::numeric_limits<float>::quiet_NaN(), -1};
    copy_name(&t, e);
    copy_index(&t, e);
    copy_percision(&t, e);
    copy_latency(&t, e);
    return e;
}

int main() {
    A a = {"Foo", 12, 1.2};
    B b = {23, "Bar", 34};

    std::cout << "a = " << gatherFrom(a) << "\n";
    std::cout << "b = " << gatherFrom(b) << "\n";
}

The DEFINE_ENTRY_FIELD_COPIER() macro defines a pair of overloaded functions for each field you want to extract. One overload (copy_##field(T const * t, …), which becomes copy_name(T const * t, …), copy_index(T const * t, …), etc.) defines its return type as decltype(T::field, true), which resolves to type bool if T has a data member called name, index, etc. If T doesn't have such a field, the substitution fails, but rather than causing a compile-time error, this first overload is simply treated as if it doesn't exist (this is called SFINAE) and the call thus resolves to the second overload, copy_##field(void const * t, …), which accepts any type at all for its first argument and does nothing.

Notes:

  1. Because this code resolves the overloads at compile-time, gatherFrom() is optimal, in the sense that the generated binary code for gatherFrom<A>(), for example, will look as if you tuned it for A by hand:

    Entry handCraftedGatherFromA(A const & a) {
        Entry e;
        e.latency = -1;
        memcpy(_result.name, a.name, sizeof(a.name));
        e.index = a.index;
        e.percision = a.percision;
        return e;
    }
    

    Under g++ 4.8 with -O3, gatherFrom<A>() and handCraftedGatherFromA() generate identical code:

    pushq   %rbx
    movl    $256, %edx
    movq    %rsi, %rbx
    movl    $-1, 264(%rdi)
    call    _memcpy
    movss   260(%rbx), %xmm0
    movq    %rax, %rcx
    movl    256(%rbx), %eax
    movss   %xmm0, 260(%rcx)
    movl    %eax, 256(%rcx)
    movq    %rcx, %rax
    popq    %rbx
    ret
    

    Clang 4.2's gatherFrom<A>() doesn't do as well, unfortunately; it redundantly zero-initialises the entire Entry. So it's not all roses, I guess.

    By using NRVO, both versions avoid copying e when returning it. However, I should note that both versions would save one op-code (movq %rcx, %rax) by using an output parameter instead of a return value.

  2. The copy_…() functions return a bool result indicating whether the copy happened or not. This isn't currently used, but it could be used, e.g., to define int Entry::validFields as a bitmask indicating which fields were populated.

  3. The macro isn't required; it's just for DRY. The essential ingredient is the use of SFINAE.

  4. The assign() overloads also aren't required. They just avoid having a different almost-identical macro to handle char arrays.

  5. The above code relies on C++11's decltype keyword. If you are using an older compiler, it's messier, but still possible. The cleanest solution I've managed to come up with is the following. Its C++98-conformant and still based on the SFINAE principle:

    template <typename C, typename F, F (C::*), typename T>
    struct EnableCopy {
        typedef T type;
    };
    
    #define DEFINE_ENTRY_FIELD_COPIER(field, ftype)             \
        template <typename T>                                   \
        inline                                                  \
        typename EnableCopy<T, ftype, &T::field, bool>::type    \
        copy_##field(T const * t, Entry & e) {                  \
            copy_value(e.field, t->field);                      \
            return true;                                        \
        }                                                       \
                                                                \
        inline                                                  \
        bool copy_##field(void const *, Entry &) {              \
            return false;                                       \
        }
    
    DEFINE_ENTRY_FIELD_COPIER(name     , char[256]);
    DEFINE_ENTRY_FIELD_COPIER(index    , int);
    DEFINE_ENTRY_FIELD_COPIER(percision, float);
    DEFINE_ENTRY_FIELD_COPIER(latency  , int);
    

    You'll also have to forgo C++11's portable std::numeric_limits<float>::quiet_NaN() and use some trick (0.0f/0.0f seems to work) or choose another magic value.



回答3:

Yes, this isn't hard at all. Just put both an A and an Entry in a single object, and make the Entry a second-class citizen:

void setDefaultValues(Entry*); // You should be able to provide these.
struct Entry {
  int x;
  int y;
};
struct Indirect : public Entry { };
template<typename T> struct EntryOr : public T, Indirect
{
  setDefaultValues(this);
};

// From C code
struct A {
  int x;
}

int main()
{
  EntryOr<A> foo;
  foo.x = 5; // A::x
  std::cout << foo.x << foo.y; // Prints A::x and Entry::y
}

(Link)