matlab: dividing vector into overlapping chunks of

2019-02-21 20:12发布

问题:

I've a vector that I would like to split into overlapping subvectors of size cs in shifts of sh. Imagine the input vector is:

v=[1 2 3 4 5 6 7 8 9 10 11 12 13]; % A=[1:13]

given a chunksize of 4 (cs=4) and shift of 2 (sh=2), the result should look like:

[1 2 3 4]
[3 4 5 6]
[5 6 7 8]
[7 8 9 10]
[9 10 11 12]

note that the input vector is not necessarily divisible by the chunksize and therefore some subvectors are discarded. Is there any fast way to compute that, without the need of using e.g. a for loop? In a related post I found how to do that but when considering non-overlapping subvectors.

回答1:

You can use the function bsxfun in the following manner:

v=[1 2 3 4 5 6 7 8 9 10 11 12 13]; % A=[1:13]
cs=4;
sh=2;

A = v(bsxfun(@plus,(1:cs),(0:sh:length(v)-cs)'));

Here is how it works. bsxfun applies some basic functions on 2 arrays and performs some repmat-like if the sizes of inputs do not fit. In this case, I generate the indexes of the first chunk, and add the offset of each chunck. As one input is a row-vector and the other is a column-vector, the result is a matrix. Finally, when indexing a vector with a matrix, the result is a matrix, that is precisely what you expect.

And it is a one-liner, (almost) always fun :).



回答2:

I suppose the simplest way is actually with a loop. A vectorizes solution can be faster, but if the result is properly preallocated the loop should perform decently as well.

v = 1:13
cs = 4;
sh = 2;

myMat = NaN(floor((numel(v) - cs) / sh) + 1,cs);
count = 0;

for t = cs:sh:numel(v)
   count = count+1;
   myMat(count,:) = v(t-cs+1:t);
end


回答3:

Do you have the signal processing toolbox? Then the command is buffer. First look at the bare output:

buffer(v, 4, 2)

ans =
     0     1     3     5     7     9    11
     0     2     4     6     8    10    12
     1     3     5     7     9    11    13
     2     4     6     8    10    12     0

That's clearly the right idea, with only a little tuning necessary to give you exactly the output you want:

[y z] = buffer(v, 4, 2, 'nodelay');
y.'

ans =
     1     2     3     4
     3     4     5     6
     5     6     7     8
     7     8     9    10
     9    10    11    12

That said, consider leaving the vectors columnwise, as that better matches most use cases. For example, the mean of each window is just mean of the matrix, as columnwise is the default.



回答4:

You can accomplish this with ndgrid:

>> v=1:13; cs=4; sh=2;
>> [Y,X]=ndgrid(1:(cs-sh):(numel(v)-cs+1),0:cs-1)
>> chunks = X+Y
chunks =
     1     2     3     4
     3     4     5     6
     5     6     7     8
     7     8     9    10
     9    10    11    12

The nice thing about the second syntax of the colon operator (j:i:k) is that you don't have to calculate k exactly (e.g. 1:2:6 gives [1 3 5]) if you plan to discard the extra entries, as in this problem. It automatically goes to j+m*i, where m = fix((k-j)/i);

Different test:

>> v=1:14; cs=5; sh=2; % or v=1:15 or v=1:16
>> [Y,X]=ndgrid(1:(cs-sh):(numel(v)-cs+1),0:cs-1); chunks = X+Y
chunks =
     1     2     3     4     5
     4     5     6     7     8
     7     8     9    10    11
    10    11    12    13    14

And a new row will form with v=1:17. Does this handle all cases as needed?



回答5:

What about this? First I generate the starting-indices based on cs and sh for slicing the single vectors out of the full-length vector, then I delete all indices for which idx+cs would exceed the vector length and then I'm slicing out the single sub-vectors via arrayfun and afterwards converting them into a matrix:

v=[1 2 3 4 5 6 7 8 9 10 11 12 13]; % A=[1:13]
cs=4;
sh=2;

idx = 1:(cs-sh):length(v);
idx = idx(idx+cs-1 <= length(v))
A = arrayfun(@(i) v(i:(i+cs-1)), idx, 'UniformOutput', false);
cell2mat(A')

E.g. for cs=5; sh=3; this would give:

idx =

     1     3     5     7


ans =

     1     2     3     4     5
     3     4     5     6     7
     5     6     7     8     9
     7     8     9    10    11

Depending on where the values cs; sh come from, you'd probably want to introduce a simple error-check so that cs > 0; as well as sh < cs. sh < 0 would be possible theoretically if you'd want to leave some values out in between.

EDIT: Fixed a very small bug, should be running for different combinations of sh and cs now.