Consider a function object F
taking a constexpr size_t
argument I
struct F
{
template <size_t I>
constexpr size_t operator()(size <I>) const { return I; }
};
wrapped within a type size <I>
, where (for brevity)
template <size_t N>
using size = std::integral_constant <size_t, N>;
Of course, we could pass I
directly but I want to emphasize that it is constexpr by using it as a template argument. Function F
is dummy here but in reality it could do a variety of useful stuff like retrieving information from the I
th element of a tuple. F
is assumed to have the same return type regardless of I
. I
could be of any integral type but is assumed non-negative.
Problem
Given a constexpr size_t
value I
, we can call F
by
F()(size <I>());
Now, what if we want to call F
with a non-constepr size_t
value i
? Consider the following:
constexpr size_t L = 10;
idx <F, L> f;
for (size_t i = 0; i < L; ++i)
cout << f(i) << " ";
(Why would I need this? To give some context, I am in fact trying to build a composite iterator into a container view that represents a sequence of "joined" (concatenated) heterogeneous containers. This would give the ability to say something like join(a, b) = c;
where arrays join(a, b)
and c
are of equal length. However, i
is iterator state so cannot be constexpr
, yet sub-iterators are stored in a tuple and need to be accessed by a constexpr
index. Individual value_type
's are roughly consistent so that the joined view can take on their common_type
type, but sub-containers and consequently sub-iterators are of different types.)
Solution
Here, I have come up with struct idx <F, L>
, which adapts function F
for this purpose, assuming the input argument is less than L
. This actually compiles fine giving output
0 1 2 3 4 5 6 7 8 9
and here is a live example.
idx
works by recursively decomposing input i
into a binary representation and reconstructing a constexpr counterpart N
:
template <typename F, size_t L, size_t N = 0, bool = (N < L)>
struct idx
{
template <size_t R = 1>
inline constexpr decltype(F()(size <0>()))
operator()(size_t I, size <R> = size <R>()) const
{
return I%2 ? idx <F, L, N+R>()(--I/2, size <2*R>()) :
I ? idx <F, L, N >()( I/2, size <2*R>()) :
F()(size <N>());
}
};
where R
represents a power of 2
at the current iteration. To avoid infinite template instantiation, a specialization is given for N >= L
, returning F()(size <0>())
as a dummy value:
template <typename F, size_t L, size_t N>
struct idx <F, L, N, false>
{
template <size_t R>
inline constexpr decltype(F()(size <0>()))
operator()(size_t I, size <R>) const { return F()(size <0>()); }
};
In fact, this method is a generalization of the more common idiom with a Boolean argument:
bool b = true;
b ? f <true>() : f <false>();
where f
is a function taking a bool
as a template argument. In this case it is evident that all two possible versions of f
are instantiated.
Question
Although this works and its run-time complexity is presumably logarithmic in i
, I am concerned with the compile-time implications, like:
how many combinations of
idx
and itstemplate operator()
are instantiated for this to work at run time for any inputi
that is not known at compile time? (I understand "all possible" again, but how many?)is it really possible to inline
operator()
?is there any alternative strategy or variant that is "easier" to compile?
should I forget about this idea as an instance of pure code bloat?
Notes
Here are the compile times (in seconds) and executable sizes (in KB) I have measured for different values of L
:
L Clang(s) GCC(s) Clang(KB) GCC(KB)
10 1.3 1.7 33 36
20 2.1 3.0 48 65
40 3.7 5.8 87 120
80 7.0 11.2 160 222
160 13.9 23.4 306 430
320 28.1 47.8 578 850
640 56.5 103.0 1126 1753
So, although it appears roughly linear in L
, it is quite long and frustratingly large.
Attempting to force operator()
inline fails: probably ignored by Clang (executable even larger), while GCC reports recursive inlining
.
Running nm -C
on the executable e.g. for L = 160
, shows 511/1253
different versions of operator()
(with Clang/GCC). These are all for N < L
so it appears the terminating specialization N >= L
does get inlined.
PS. I would add tag code-bloat
but the system won't let me.
I'll take the obvious position here and ask if "I want to emphasize that it is
constexpr
by using it as a template argument" is worth this cost and if:would not be a much simpler solution.
I call this technique the magic switch.
The most efficient way I know of to do this is to build your own jump table.
which requires some static setup, but is pretty fast when run.
An assert that
i
is in bounds may also be useful.live example
If your solution have cap on maximum possible value (say 256) you can use macro magic and switch statement to simplify it:
Another possible solution is (with C++11) use variadic templates:
This is not exactly an answer and my question still stands, yet I have found a workaround that gives an impressive boost in compilation. It is a minor tweak of the solution given in the question, where parameter
R
is moved fromoperator()
outside to structureidx
, and the terminating condition now includes bothR
andN
:The entire code is given in this new live example.
This approach apparently cuts down a huge number of unnecessary specialization combinations for
R
. Compile times and executable sizes drop dramatically. For instance, I have been able to compile in 10.7/18.7 seconds with Clang/GCC forL = 1<<12
(4096), yielding an executable of 220/239 KB. In this case,nm -C
shows 546/250 versions ofoperator()
.