Is there any way to optimize sincos calls in CUDA?

2019-02-28 09:54发布

问题:

I'm writing a program in CUDA that makes a huge amount of calls to the sincos() function, using double precision. I'm afraid this is one of the biggest bottlenecks of the code, and I cannot reduce the number of calls to the function.

Is there any decent approximation to sincos in CUDA or in a library I can import? I am also quite concerned with the accuracy, so the better the approximation is, the happier my code will be.

I've also thought about building a lookup table or approximating the values with their taylor series, but I want some opinions before going down that road.

回答1:

A pretty fast and accurate sincos function is available in the CUDA math api. Just include math.h. Or use sincosf (here) if that will work for you. (I'm aware that you said double precision in your question. Just pointing some things out.)

If you can use sincospif instead of sincosf, @njuffa has worked his magic here, which may interest you.

This question and this question may also interest you.