I wrote a Julia code which computes integrals over Gaussian functions and I have a sort-of kernel function which is called over and over again.
According to the Julia built-in Profile
Module, this is where I spend most of the time during the actual computation and therefore I would like to see if there is any way in which I can improve it.
It is a recursive function and I implemented it in a kind of straightforward way. As I am not that much used to recursive functions, maybe somebody out there has some ideas/suggestions on how to improve it (both from a purely theoretical algorithmic point of view and/or exploiting special optimizations from the JIT compiler).
Here you have it:
"""Returns the integral of an Hermite Gaussian divided by the Coulomb operator."""
function Rtuv{T<:Real}(t::Int, u::Int, v::Int, n::Int, p::Real, RPC::Vector{T})
if t == u == v == 0
return (-2.0*p)^n * boys(n,p*norm(RPC)^2)
elseif u == v == 0
if t > 1
return (t-1)*Rtuv(t-2, u, v, n+1, p, RPC) +
RPC[1]*Rtuv(t-1, u, v, n+1, p, RPC)
else
return RPC[1]*Rtuv(t-1, u, v, n+1, p, RPC)
end
elseif v == 0
if u > 1
return (u-1)*Rtuv(t, u-2, v, n+1, p, RPC) +
RPC[2]*Rtuv(t, u-1, v, n+1, p, RPC)
else
return RPC[2]*Rtuv(t, u-1, v, n+1, p ,RPC)
end
else
if v > 1
return (v-1)*Rtuv(t, u, v-2, n+1, p, RPC)
RPC[3]*Rtuv(t, u, v-1, n+1, p, RPC)
else
return RPC[3]*Rtuv(t, u, v-1, n+1, p, RPC)
end
end
end
Don't pay that much attention to the function boys
, since according to the profiler it is not that heavy.
Just to give an idea of the range of numbers: usually the first call comes from t+u+v
ranging from 0
to 3
, while n
always starts at 0
.
Cheers
EDIT -- New information
The generated version is slower for small values of
I was benchmarking badly for this case, without interpolating the argument passed. By doing it properly I am always faster with the approach explained in the accepted answer, so hurray!t,u,v
, I believe the reason is because expressions are not optimzied by the compiler.
More generally, does the compiler identify trivial cases such as multiplication by zeros and ones and optimize those away?
Answer to myself: from a quick checking of simple code with @code_llvm
it seems not to be the case.
Maybe this works in your case: you can "memoize" whole compiled methods using generated functions and get rid of all recursion after the first call.
Since
t
,u
, andv
will stay small, you could generate the fully expanded code for the recursions. Assume for the simplicity a bogus implementation ofThen
will generate you fully expanded expressions like this:
We can stuff that into a generated function
Rtuv
takingVal
types. For each different combination ofT
,U
, andV
, this function will useRtuv_expr
to compile the respective expression and from then on use this method -- no recursion anymore:You have to call it with
t
,u
,v
wrapped inVal
, though:If you test a small loop like this,
it will need some time for the first run, but then go pretty fast, since the used methods are already compiled.