As part of a larger problem, I need to solve small linear systems (i.e NxN where N ~10) so using the relevant cuda libraries doesn't make any sense in terms of speed.
Unfortunately something that's also unclear is how to go about solving such systems without pulling in the big guns like GSL, EIGEN etc.
Can anyone point me in the direction of a dense matrix solver (Ax=B) in straight C?
For those interested, the basic structure of the generator for this section of code is:
ndarray=some.generator(N,N)
for v in range N:
B[v]=_F(v)*constant
for x in range N:
A[v,x]=-_F(v)*ndarray[x,v]
Unfortunately I have approximately zero knowledge of higher mathematics, so any advice would be appreciated.
UPDATE: I've been working away at this, and have a nearly-solution that runs but isn't working. Anyone lurking is welcome to check out what I've got so far on pastebin.
I'm using Crout Decomposition with Pivoting which seems to be the most general approach. The idea for this test is that every thread does the same work. Boring I know, but the plan is that the matrixcount variable is increased, actual data is put in, and each thread solves the small matrices individually.
Thanks for everyone who's been checking on this.
POST-ANSWER UPDATE: Finished the matrix solving code for CPU and GPU operation, check out my lazy-writeup here