adapting a non-constexpr integral value to a non-t

2019-02-17 19:52发布

问题:

Consider a function object F taking a constexpr size_t argument I

struct F
{
    template <size_t I>
    constexpr size_t operator()(size <I>) const { return I; }
};

wrapped within a type size <I>, where (for brevity)

template <size_t N>
using size = std::integral_constant <size_t, N>;

Of course, we could pass I directly but I want to emphasize that it is constexpr by using it as a template argument. Function F is dummy here but in reality it could do a variety of useful stuff like retrieving information from the Ith element of a tuple. F is assumed to have the same return type regardless of I. I could be of any integral type but is assumed non-negative.

Problem

Given a constexpr size_t value I, we can call F by

F()(size <I>());

Now, what if we want to call F with a non-constepr size_t value i? Consider the following:

constexpr size_t L = 10;
idx <F, L> f;
for (size_t i = 0; i < L; ++i)
    cout << f(i) << " ";

(Why would I need this? To give some context, I am in fact trying to build a composite iterator into a container view that represents a sequence of "joined" (concatenated) heterogeneous containers. This would give the ability to say something like join(a, b) = c; where arrays join(a, b) and c are of equal length. However, i is iterator state so cannot be constexpr, yet sub-iterators are stored in a tuple and need to be accessed by a constexpr index. Individual value_type's are roughly consistent so that the joined view can take on their common_type type, but sub-containers and consequently sub-iterators are of different types.)

Solution

Here, I have come up with struct idx <F, L>, which adapts function F for this purpose, assuming the input argument is less than L. This actually compiles fine giving output

0 1 2 3 4 5 6 7 8 9 

and here is a live example.

idx works by recursively decomposing input i into a binary representation and reconstructing a constexpr counterpart N:

template <typename F, size_t L, size_t N = 0, bool = (N < L)>
struct idx
{
    template <size_t R = 1>
    inline constexpr decltype(F()(size <0>()))
    operator()(size_t I, size <R> = size <R>()) const
    {
        return I%2 ? idx <F, L, N+R>()(--I/2, size <2*R>()) :
               I   ? idx <F, L, N  >()(  I/2, size <2*R>()) :
               F()(size <N>());
    }
};

where R represents a power of 2 at the current iteration. To avoid infinite template instantiation, a specialization is given for N >= L, returning F()(size <0>()) as a dummy value:

template <typename F, size_t L, size_t N>
struct idx <F, L, N, false>
{
    template <size_t R>
    inline constexpr decltype(F()(size <0>()))
    operator()(size_t I, size <R>) const { return F()(size <0>()); }
};

In fact, this method is a generalization of the more common idiom with a Boolean argument:

bool b = true;
b ? f <true>() : f <false>();

where f is a function taking a bool as a template argument. In this case it is evident that all two possible versions of f are instantiated.

Question

Although this works and its run-time complexity is presumably logarithmic in i, I am concerned with the compile-time implications, like:

  • how many combinations of idx and its template operator() are instantiated for this to work at run time for any input i that is not known at compile time? (I understand "all possible" again, but how many?)

  • is it really possible to inline operator()?

  • is there any alternative strategy or variant that is "easier" to compile?

  • should I forget about this idea as an instance of pure code bloat?

Notes

Here are the compile times (in seconds) and executable sizes (in KB) I have measured for different values of L:

 L      Clang(s)    GCC(s)    Clang(KB)    GCC(KB)
 10       1.3       1.7          33         36
 20       2.1       3.0          48         65
 40       3.7       5.8          87        120
 80       7.0      11.2         160        222
160      13.9      23.4         306        430
320      28.1      47.8         578        850
640      56.5     103.0        1126       1753

So, although it appears roughly linear in L, it is quite long and frustratingly large.

Attempting to force operator() inline fails: probably ignored by Clang (executable even larger), while GCC reports recursive inlining.

Running nm -C on the executable e.g. for L = 160, shows 511/1253 different versions of operator() (with Clang/GCC). These are all for N < L so it appears the terminating specialization N >= L does get inlined.

PS. I would add tag code-bloat but the system won't let me.

回答1:

I call this technique the magic switch.

The most efficient way I know of to do this is to build your own jump table.

// first, index list boilerplate.  Does log-depth creation as well
// needed for >1000 magic switches:
template<unsigned...Is> struct indexes {typedef indexes<Is...> type;};
template<class lhs, class rhs> struct concat_indexes;
template<unsigned...Is, unsigned...Ys> struct concat_indexes<indexes<Is...>, indexes<Ys...>>{
    typedef indexes<Is...,Ys...> type;
};
template<class lhs, class rhs>
using ConcatIndexes = typename concat_indexes<lhs, rhs>::type;

template<unsigned min, unsigned count> struct make_indexes:
    ConcatIndexes<
        typename make_indexes<min, count/2>::type,
        typename make_indexes<min+count/2, (count+1)/2>::type
    >
{};
template<unsigned min> struct make_indexes<min, 0>:
    indexes<>
{};
template<unsigned min> struct make_indexes<min, 1>:
    indexes<min>
{};
template<unsigned max, unsigned min=0>
using MakeIndexes = typename make_indexes<min, max-min>::type;

// This class exists simply because [](blah){code}... `...` expansion
// support is lacking in many compilers:
template< typename L, typename R, unsigned I >
struct helper_helper {
    static R func( L&& l ) { return std::forward<L>(l)(size<I>()); }
};
// the workhorse.  Creates an "manual" jump table, then invokes it:
template<typename L, unsigned... Is>
auto
dispatch_helper(indexes<Is...>, L&& l, unsigned i)
-> decltype( std::declval<L>()(size<0>()) )
{
  // R is return type:
  typedef decltype( std::declval<L>()(size<0>()) ) R;
  // F is the function pointer type for our jump table:
  typedef R(*F)(L&&);
  // the jump table:
  static const F table[] = {
    helper_helper<L, R, Is>::func...
  };
  // invoke at the jump spot:
  return table[i]( std::forward<L>(l) );
};
// create an index pack for each index, and invoke the helper to
// create the jump table:
template<unsigned Max, typename L>
auto
dispatch(L&& l, unsigned i)
-> decltype( std::declval<L>()(size<0>()) )
{
  return dispatch_helper( MakeIndexes<Max>(), std::forward<L>(l), i );
};

which requires some static setup, but is pretty fast when run.

An assert that i is in bounds may also be useful.

live example



回答2:

If your solution have cap on maximum possible value (say 256) you can use macro magic and switch statement to simplify it:

#define POS(i) case (i): return F<(i)>(); break;
#define POS_4(i) POS(i + 0) POS(i + 1) POS(i + 2) POS(i + 3)
#define POS_16(i) POS_4(i + 0) POS_4(i + 4) POS_4(i + 8) POS_4(i + 12)

int func(int i)
{
    switch(i)
    {
        POS_16(0)
    }
}

Another possible solution is (with C++11) use variadic templates:

template<int I>
struct FF
{
    static int f() { return I; }
};


template<typename... I>
int func(int i)
{
    constexpr int (*Func[])() = { I::f... };
    return Func[i]();
}

int main(int argc, char** argv)
{
    func<FF<0>,FF<1>>(1);
}


回答3:

I'll take the obvious position here and ask if "I want to emphasize that it is constexpr by using it as a template argument" is worth this cost and if:

struct F
{
    constexpr size_t operator()(size_t i) const { return i; }
    template <size_t I>
    constexpr size_t operator()(size <I>) const { return (*this)(I); }
};

would not be a much simpler solution.



回答4:

This is not exactly an answer and my question still stands, yet I have found a workaround that gives an impressive boost in compilation. It is a minor tweak of the solution given in the question, where parameter R is moved from operator() outside to structure idx, and the terminating condition now includes both R and N:

template <
    typename F, size_t L,
    size_t R = 1, size_t N = 0, bool = (R < 2 * L && N < L)
>
struct idx //...

The entire code is given in this new live example.

This approach apparently cuts down a huge number of unnecessary specialization combinations for R. Compile times and executable sizes drop dramatically. For instance, I have been able to compile in 10.7/18.7 seconds with Clang/GCC for L = 1<<12 (4096), yielding an executable of 220/239 KB. In this case, nm -C shows 546/250 versions of operator().