I have a large matrix:
set.seed(1)
a <- matrix(runif(9e+07),ncol=300)
I want to sort each row in the matrix:
> system.time(sorted <- t(apply(a,1,sort)))
user system elapsed
42.48 3.40 45.88
I have a lot of RAM to work with, but I would like a faster way to perform this operation.
Another excellent method from Martin Morgan without any usage of external packages in Fastest way to select i-th highest value from row and assign to new column:
There is also an equivalent for sorting by columns under comments in the same link.
Timing code using same data as Craig:
Timings:
And to present a more complete picture, another test for character class (excluding
Rfast::rowSort
as it cannot handle character class):Timings:
Head to head:
Timings:
Well, I'm not aware of that many ways to sort faster in R, and the problem is that you're only sorting 300 values, but many times. Still, you can eek some extra performance out of sort by directly calling
sort.int
and usingmethod='quick'
:But a better way should be to use the parallel package to sort parts of the matrix in parallel. However, the overhead of transferring data seems to be too big, and on my machine it starts swapping since I "only" have 8 GB memory:
The package
grr
contains an alternate sort method that can be used to speed up this particular operation (I have reduced the matrix size somewhat so that this benchmark doesn't take forever) :The difference becomes dramatic when the matrix contains characters:
Results are identical for all three.