The C++11 features, with constexpr
and template argument packs, should in my opinion be strong enough to perform some rather complex computations. One possible example for which I have a practical application is the computation of the nth prime at compile time.
I'm asking for ways to implement this computation. If more than one solution are proposed, it might be interesting to compare them.
To give you an idea of my performance expectations: I'd hope for some code which can find the 512th prime (which is 3671) in less than one second compile time on reasonable desktop hardware.
I implemented the simplest possible way, not using templates at all and it works:
It needs increased constexpr-depth a bit, but it fits in time easily:
The following trick reduces depth of
getPrimeLoop
recursion to logarithmic, so g++ can complete with default depth (without measurable time penalty):I've given this a try myself, and written the following implementation:
This requires increasing the maximum template depth of
gcc
to more than the default of 900 (in my gcc 4.7.2), e.g. by passing-ftemplate-depth=1200
. And it is way too slow: it takes about 3 minutes on my hardware. So I very much hope for some more efficient code in a different answer.In terms of computation method, the above does something like trial division. A sieve of Eratosthenes might perform better, but so far I couldn't think of a way to write that in a constexpr-compatible fashion.
The answer by Mike Kinghan got me thinking along lines I hadn't though before. If template instantiation is such a problem which causes such severe memory consumption, how can we reduce that? I eventually came up with a scheme where instead of an argument pack for all primes found so far, I use a chain of types, each one referencing the one before, and a chain of static functions in these types which can use the ones from types before.
The result I'll paste below is still a lot slower than the one suggested by zch, but I find it interesting enough to share, since it might be a useful approach for other applications.
The beast above requires modifications both to the constexpr depth and the template depth. The following values are sharp bounds for my compiler.
I doubt that your 1 second goal is within reach of any hardware that doesn't have security guards. However I believe the following meta-program can close down on it a lot:
I'll call this meta-program MyNthPrime and call your one YourNthPrime.
Your seem to have much heftier hardware than mine and certainly more memory. I have a routine Lenovo ThinkPad T420, 4-core i5, 8GB RAM, 8GB swap, running Linux Mint 14, kernel 3.5.0. You report it takes you 3 mins. to build your YourNthPrime. Measured by the
time
command, it takes me 35 mins. to build YourNthPrime, with no apps running but the terminal and system monitor.The compiler is GCC 4.7.2 and the commandline the options are:
That elapsed time breaks down as:
It takes me 1.5 mins to build MyNthPrime, with:
and the elapsed time breaks down as:
That
-ftemplate-depth=2100
is not a transposition typo. More of this shortly.MyNthPrime isn't fairly and squarely 23 times faster than YourNthPrime. The broken-down timings suggest that MyNthPrime is actually around 2.75 times as fast as as YourNthPrime on user-time. But they also show that YourNthPrime's build really lost the farm on real time. What was it doing for the fruitless 9/10ths of its existence? Swapping, of course.
Both builds scoffed 95% of my 8GB system RAM within 45 secs., but MyNthPrime topped out around there and didn't swap. YourNthPrime went on eating swap space up to a peak 3.9GB, and long before that all my CPUs were dozing.
This point is notable when you take it with the fact that MyNthPrime needs getting on for twice as much
-ftemplate-depth
as YourNthPrime. Folk wisdom is that an extravagant-ftemplate-depth
is the road to ruin for a meta-program build time, because it equates to extravagant memory consumption, which only has to slide into heavy swapping and you're watching paint dry. But the run-off of YourNthPrime and MyNthPrime does not bear this out - quite the opposite. The lesson I take is that the depth to which you must instantiate recursive templates is not always a good measure of the quantity of template instantiating you must do, and it is the quantity that matters for your memory resources.Although they don't look superficially similar, MyNthPrime and YourNthPrime, both implement the trial-division algorithm for prime generation. MyNthPrime is that much faster solely because it is rather more shrewdly coded to be sparing to of recursive template instantiations, and of the memory they gobble up.
YourNthPrime fields 4 recursive templates for the computation, all consuming the same recursive variadic template argument list. MyNthPrime fields 2: it's just giving the compiler about half as many enormous instantiations to do.
YourNthPrime (as I read it) perceives the potential efficiency of doing its trial-divisions in ascending order of the primes in hand - because the chances of successful division increase toward the smaller primes; and once a prime in hand exceeds 1/2 of the candidate number being divided, the chances are 0. Hit the most likely divisors first and you optimize your prospect of a quick verdict and exit. But an obstacle to exploiting this efficiency is the fact that the primes in hand are represented by a variadic template argument list with the largest always at its head. To surmount this obstacle YourNthPrime deploys the recursive variadic template function
lastArg<>()
, effectively to reverse the order in which the primes in hand are consumed in the divisions.lastArg<>()
presents the primes in hand to the template function:in ascending order for recursive trial division of the next candidate
i
by the primeshead, tail...
. It's here I think you look tolastArg<>()
to pay its way by ensuringhead
is always he next-best prospect for getting you out of the costly righthand side of that&&
.But to achieve this
lastArg<>()
itself recursively traverses the whole list of primes in hand at each invocation, before you even get the opportunity for a verdict oni
. So it would be cheaper just to letisPrime<>()
traverse the primes in hand as far as it needs to, testingi
as it goes, dispense withlastArg<>()
and save all the recursive instantiations thereof.The job done by
isPrime<>()
in YourNthPrime - the recursive trial division - is done in MyNthPrime by:is_co_prime<>()
takes 10 lines to do whatis_prime<>()
does in one, I could have done it in one line also:could be the body of the function. But the ugly trounces the pretty for efficiency here. Every time
is_prime<>()
has to recurse into the tail, that tail is only one prime shorter than it was before. Every timeis_co_prime<>()
has to do the same thing the tail is two primes shorter than it was before. Its tail recursions are shallower in the worst case than those ofis_prime<>()
and there can only half as many.is_co_prime<>()
divides the candidate numberN
by the righthand - the smallest and likeliest - of any pair of available divisors first, and returns no-prime on success; otherwise recurses to any divisors still to the right of that pair and continues trial division of the next-but-one until success or exhaustion. Only at exhaustion does it resort to trial division by the larger of the the original pair - the least likely divisor of any it has tried. And likewise within each of the intervening smaller recursions, the least likely divisor gets tried last.Though this algorithm can be seen to get speedily to the smaller and likelier divisors of
N
, an itch will persist to get straight to the smallest of them first and try them in true descending likelihood, as perlastArg<>()
. We have to quell this itch with the recognition that any way of "getting straight to the smallest", when it is at the tail of the list, will be a meta-function that must itself recurse over the whole list before we even get started with trial divisions. The implementation ofis_co_prime<>()
gets us down the list "two steps at a time" doing the trial division as it does so.True, sometimes it will unluckily "step over" the largest divisor of
N
first time and then won't find it again until and unless it bottoms out and recurses backwards up the list. But onceN
is big enough to make it matter there will mostly be at least one smaller divisor further on to the right and it would be really unlucky to miss all of them. So the risk of skipping the largest on the way down is no big deal. Remember too there aren't going to be any divisors on the way down till we reach theN/2
point. That means the worst-case waste of recursions by skipping the one-and-only divisor of someN
on the way down is confined to the tail of the list from that point.You speculate that a Sieve of Erathosthenes meta-program might compile faster, and you are right. As a prime generator, Sieve has better theoretical complexity than Trial Division. An elegant template meta-program implementation by Peter Simons, is here, dating from 2006 or before. (And as Peter Wood commented, a non-terminating Sieve of Erathosthenes meta-program broke the news that the C++ template system is Turing-complete.) With C++11 facilities Simons' meta-program can be much abbreviated but I don't think made much more efficient. Just it stands, Simons Sieve can generate at compiletime all the primes up to the 512th in under 9 secs. on my ThinkPad. It needs
-ftemplate-depth=4000
or thereabouts to do it, but only about 0.5GB of memory - an even more arresting counterexample to the folk wisdom that big template-depth == big memory usage.So Erathosthenes' Sieve leaves Trial Division floundering in the dust. Sadly, for the purpose of your exercise, Sieve is no use. It's called the sieve because we have to input an integer upper limit
U
and it sieves out of the composite numbers between 2 andU
, leaving the primes. Thus to apply it for finding exactly the Nth prime =Pn
and not possibly any other, you must anyhow computePn
in some other way, give Sieve an upper limit ofPn + 1
(orPn + 2
, forPn > 2
), then throw away all thePi, 2 <= Pi < Pn
, that it returns to you, and keep just thePn
that you had already computed. This is a no-op.Several of the commenters make the point that the identity any Nth prime you could possibly generate by compiletime meta-programming will be known beforehand or calculable beforehand by much simpler and vastly faster means. I can't disagree, but I support your general point that with C++11 facilities, TMP takes a huge stride toward real-world utility that is well worth exploring - the more so as whatever takes a minute to compile right now will take a second within in a decade.
Meanwhile, without even leaving our uncannily sophisticated C++ compilers, with TMP problems like this we can still experience the nature of programming an early computer, with clock speed and memory in the K's, in a "language" modelled tightly - but within arcane constraints! - on classical recursive function theory. That's why you've really gotta love them!