Let's say you had to calculate the sine (cosine or tangent - whatever) where the domain is between 0.01 and 360.01. (using C#)
What would be more performant?
- Using Math.Sin
- Using a lookup array with precalculated values
I would anticpate that given the domain, option 2 would be much faster. At what point in the precision of the domain (0.0000n) does the performance of the calculation exceed the lookup.
The answer to this depends entirely on how many values are in your lookup table. You say "the domain is between 0.01 and 360.01", but you don't say how many values in that range might be used, or how accurate you need the answers to be.Forgive me for not expecting to see significant digits used to convey implicit meaning in a non-scientific context.More information is still needed to answer this question. What is the expected distribution of values between 0.01 and 360.01? Are you processing a lot of data other than the simple sin( ) computation?
36000 double precision values takes over 256k in memory; the lookup table is too large to fit in L1 cache on most machines; if you're running straight through the table, you'll miss L1 once per sizeof(cacheline)/sizeof(double) accesses, and probably hit L2. If, on the other hand, your table accesses are more or less random, you will be missing L1 almost every time you do a lookup.
It also depends a lot on the math library of the platform that you're on. Common i386 implementations of the sin function, for example, range from ~40 cycles up to 400 cycles or even more, depending on your exact microarchitecture and library vendor. I haven't timed the Microsoft library, so I don't know exactly where the C# Math.sin implementation would fall.
Since loads from L2 are generally faster than 40 cycles on a sane platform, one reasonably expects the lookup table to be faster considered in isolation. However, I doubt you're computing sin( ) in isolation; if your arguments to sin( ) jump all over the table, you will be blowing other data needed for other steps of your computation out of the cache; although the sin( ) computation gets faster, the slowdown to other parts of your computation may more than outweigh the speedup. Only careful measurement can really answer this question.
Am I to understand from your other comments that you're doing this as part of a FFT computation? Is there a reason that you need to roll your own FFT instead of using one of the numerous extremely high quality implementations that already exist?