What is a good representation for matrices in Clojure? I'm interested in dealing with dense matrices of floating point numbers. The "list of lists" representation springs to mind, but is there something better?
Some criteria for a good representation include:
- Efficiency: They won't be used for constant processing of huge data sets, but I don't want to spend hours calculating results that could have been done in minutes with a better design.
- Java Interoperability: It would be nice to pass the data back and forth between the two languages easily.
- Easy Parallelization: If I can use all the cores available simply by replacing
map
withpmap
, that would be nice. - Amenable to the use of
reduce
: It seems like lots of the calculations I'm doing work very well withreduce
. - Ability to Represent Image Scan Lines in Matrix Rows: Not really very important, but would be nice to have.
Any thoughts?
I recently wrote some code that needed matrix maths, and initially I used vector-of-vectors, map & reduce, but found the resulting code hard to understand when I returned to it (I'm new to Clojure mind). Incanter made the same code very terse, easy to understand (standard matrix ops) and much much faster.
I'm no expert, but here's my opinion anyway :)
list-of-lists is probably the most natural Clojure idiom for representing matrices. This structure also lends itself nicely to map/reduce kinds of operations. Clojure is also pretty efficient at handling sequences - probably better than most alternatives.
I can't swear to this, but I think I've seen Clojure working 3 or all 4 of my CPUs hard on programs I wrote that were functional in style but didn't make any attempt to be parallel. I suspect the compiler is finding some opportunities for parallel processing on its own.
I think the sequence types created by Clojure will work as Lists in Java, or at least be Iterable. That's probably good enough for what you want, though you may run into problems if you try to treat those structures as modifiable in Java.
Lists are best accessed sequentially. If you're planning to jump around a lot in the matrix, a vector-of-vectors may suit you a little better, performance-wise. I suspect that beats using the nth function.
As a former C programmer, I briefly considered that you could implement your matrix as a one-dimensional structure (i.e. a straight sequence or better a vector), and do your own index computations to find the right element. You could use the partition function to step through it... well, that could be made to work but I suspect there are very good reasons not to.