Edit:
both of these seem relevant: How to efficiently use Rprof in R? and kernel matrix computation outside SVM training in kernlab
The first of the above is a very similar question to this one, though not the same. That question refers to base::Rprof
. This question refers to profr::profr
.
Original Question
For example, my code is slower than I'd like:
install.packages("profr")
devtools::install_github("alexwhitworth/imputation")
x <- matrix(rnorm(1000), 100)
x[x>1] <- NA
library(imputation)
library(profr)
a <- profr(kNN_impute(x, k=5, q=2), interval= 0.005)
plot(a)
I get slightly different plots every time that I run this code due to the stochastic nature of the profiling, but they are generally similar. But I don't know how to interpret the plots.
I've also tried using library(lineprof)
following Adv-R and similarly been unable to interpret the plots.
Any help appreciated.
Also, it doesn't seem (to me at least), like the plots are at all helpful here. But the data structure itself does seem to suggest a solution:
R> head(a, 10)
level g_id t_id f start end n leaf time source
9 1 1 1 kNN_impute 0.005 0.190 1 FALSE 0.185 imputation
10 2 1 1 var_tests 0.005 0.010 1 FALSE 0.005 <NA>
11 2 2 1 apply 0.010 0.190 1 FALSE 0.180 base
12 3 1 1 var.test 0.005 0.010 1 FALSE 0.005 stats
13 3 2 1 FUN 0.010 0.110 1 FALSE 0.100 <NA>
14 3 2 2 FUN 0.115 0.190 1 FALSE 0.075 <NA>
15 4 1 1 var.test.default 0.005 0.010 1 FALSE 0.005 <NA>
16 4 2 1 sapply 0.010 0.040 1 FALSE 0.030 base
17 4 3 1 dist_q.matrix 0.040 0.045 1 FALSE 0.005 imputation
18 4 4 1 sapply 0.045 0.075 1 FALSE 0.030 base
As mentioned above, the data structure itself appears to suggest an answer, which is to summarize the data by function via
tapply
. This can be done quite simply for a single run ofprofr::profr
From this, I can see that the biggest time users are
kernlab::kernelMatrix
and the overhead from R for S4 classes and generics.Preferred:
I note that, given the stochastic nature of the sampling process, I prefer to use averages to get a more robust picture of the time profile:
Removing the unusual replications and converting to
data.frame
s:Merge replications (almost certainly could be faster) and examine results:
Results
From the results, a similar but more robust picture emerges as with a single case. Namely, there is a lot of overhead from R and also that
library(kernlab)
is slowing me down. Of note, sincekernlab
is implemented in S4, the overhead in R is related since S4 classes are substantially slower than S3 classes.I'd also note that my personal opinion is that a cleaned up version of this might be a useful pull request as a summary method for profr. Although I'd be interested to see others' suggestions!