I have about 300 files, each containing 1000 time series realisations (~76 MB each file).
I want to calculate the quantiles (0.05, 0.50, 0.95) at each time step from the full set of 300000 realisations.
I cannot merge together the realisations in 1 file because it would become too large.
What's the most efficient way of doing this?
Each matrix is generated by running a model, however here is a sample containing random numbers:
x <- matrix(rexp(10000000, rate=.1), nrow=1000)
There are at least three options:
Edit: Example of (3).
Note that I am not a champion algorithm designer and that someone has almost certainly designed a better algorithm for this. Also, this implementation is not particularly efficient. If speed matters to you, consider Rcpp, or even just more optimized R for this. Making a bunch of lists and then extracting values from them is not so smart, but it was easy to prototype this way so I went with it.
In the end, the value is a little off from the exact version. I suspect I'm shifted over by one or some equally silly explanation, but maybe I'm missing something fundamental.