Let us have a vector of size N
. For example:
x = rand(N,1)
I want to compute the minimum standard deviation of a subset of length K
in the vector.
When N
and K
are small, it is easy to find the best subset since I can use nchoosek(N,K)
to enumerate all possible subsets. But when the values of N
and K
are larger than let's say N=50
and K=25
, nchoosek
fails to compute the combinations since the size of the possible subsets are huge.
I wonder if there is a better algorithm to compute the subset that gives the smallest standard deviation in an array efficiently. For example via dynamic programming. Any ideas?
Update:
I've implemented it in a loop aftter the answers and compared to the combinatorial solution. The results are always the same but the speed gain are unprecedented.
n = 20;
k = 10;
x = rand(n,1);
C = nchoosek(x, k);
tic
mins = realmax;
for i = 1:size(C,1)
s = std(C(i,:));
if s < mins
mins = s;
bestC = C(i,:);
end
end
toc
tic
[x2, j] = sort(x);
mins2 = realmax;
for i = 1:(n-k+1)
s = std(x2(i:(i+k-1)));
if s < mins2
mins2 = s;
idx = j((i:(i+k-1)));
end
end
toc
if mins == mins2
'Equal'
end
gives
Elapsed time is 7.786579 seconds.
Elapsed time is 0.002068 seconds.
ans =
Equal