Reshape vector to matrix with column-wise zero pad

2019-02-23 16:38发布

问题:

for an input matrix

in = [1 1;
      1 2;
      1 3;
      1 4;
      2 5;
      2 6;
      2 7;
      3 8;
      3 9;
      3 10;
      3 11];

i want to get the output matrix

out = [1 5 8;
       2 6 9;
       3 7 10;
       4 0 11];

meaning i want to reshape the second input column into an output matrix, where all values corresponding to one value in the first input column are written into one column of the output matrix.

As there can be different numbers of entries for each value in the first input column (here 4 values for "1" and "3", but only 3 for "2"), the normal reshape function is not applicable. I need to pad all columns to the maximum number of rows.

Do you have an idea how to do this matlab-ish?

The second input column can only contain positive numbers, so the padding values can be 0, -x, NaN, ...

The best i could come up with is this (loop-based):

maxNumElem = 0;
for i=in(1,1):in(end,1)
    maxNumElem = max(maxNumElem,numel(find(in(:,1)==i)));
end

out = zeros(maxNumElem,in(end,1)-in(1,1));
for i=in(1,1):in(end,1)
    tmp = in(in(:,1)==i,2);
    out(1:length(tmp),i) = tmp;
end

回答1:

Either of the following approaches assumes that column 1 of in is sorted, as in the example. If that's not the case, apply this initially to sort in according to that criterion:

in = sortrows(in,1);

Approach 1 (using accumarray)

  1. Compute the required number of rows, using mode;
  2. Use accumarray to gather the values corresponding to each column, filled with zeros at the end. The result is a cell;
  3. Concatenate horizontally the contents of all cells.

Code:

[~, n] = mode(in(:,1));                                                 %//step 1
out = accumarray(in(:,1), in(:,2), [], @(x){[x; zeros(n-numel(x),1)]}); %//step 2
out = [out{:}];                                                         %//step 3

Alternatively, step 1 could be done with histc

n = max(histc(in(:,1), unique(in(:,1))));                               %//step 1

or with accumarray:

n = max(accumarray(in(:,1), in(:,2), [], @(x) numel(x)));               %//step 1

Approach 2 (using sparse)

Generate a row-index vector using this answer by @Dan, and then build your matrix with sparse:

a = arrayfun(@(x)(1:x), diff(find([1,diff(in(:,1).'),1])), 'uni', 0); %//'
out = full(sparse([a{:}], in(:,1), in(:,2)));


回答2:

Introduction to proposed solution and Code

Proposed here is a bsxfun based masking approach that uses the binary operators available as builtins for use with bsxfun and as such I would consider this very appropriate for problems like this. Of course, you must also be aware that bsxfun is a memory hungry tool. So, it could pose a threat if you are dealing with maybe billions of elements depending also on the memory available for MATLAB's usage.

Getting into the details of the proposed approach, we get the counts of each ID from column-1 of the input with histc. Then, the magic happens with bsxfun + @le to create a mask of positions in the output array (initialized by zeros) that are to be filled by the column-2 elements from input. That's all you need to tackle the problem with this approach.

Solution Code

counts = histc(in(:,1),1:max(in(:,1)))'; %//' counts of each ID from column1

max_counts = max(counts); %// Maximum counts for each ID
mask = bsxfun(@le,[1:max_counts]',counts); %//'# mask of locations where
                                           %//   column2 elements are to be placed
out = zeros(max_counts,numel(counts)); %// Initialize the output array
out(mask) = in(:,2); %// place the column2 elements in the output array

Benchmarking (for performance)

The benchmarking presented here compares the proposed solution in this post against the various methods presented in Luis's solution. This skips the original loopy approach presented in the problem as it appeared to be very slow for the input generated in the benchmarking code.

Benchmarking Code

num_ids = 5000;
counts_each_id = randi([10 100],num_ids,1);
num_runs = 20;  %// number of iterations each approach is run for

%// Generate random input array
in = [];
for k = 1:num_ids
    in = [in ; [repmat(k,counts_each_id(k),1) rand(counts_each_id(k),1)]];
end

%// Warm up tic/toc.
for k = 1:50000
    tic(); elapsed = toc();
end

disp('------------- With HISTC + BSXFUN Masking approach')
tic
for iter = 1:num_runs
    counts = histc(in(:,1),1:max(in(:,1)))';
    max_counts = max(counts);
    out = zeros(max_counts,numel(counts));
    out(bsxfun(@le,[1:max_counts]',counts)) = in(:,2);
end
toc
clear counts max_counts out

disp('------------- With MODE + ACCUMARRAY approach')
tic
for iter = 1:num_runs
    [~, n] = mode(in(:,1));                                                 %//step 1
    out = accumarray(in(:,1), in(:,2), [], @(x){[x; zeros(n-numel(x),1)]}); %//step 2
    out = [out{:}];
end
toc
clear n out

disp('------------- With HISTC + ACCUMARRAY approach')
tic
for iter = 1:num_runs
    n = max(histc(in(:,1), unique(in(:,1))));
    out = accumarray(in(:,1), in(:,2), [], @(x){[x; zeros(n-numel(x),1)]}); %//step 2
    out = [out{:}];
end
toc
clear n out

disp('------------- With ARRAYFUN + Sparse approach')
tic
for iter = 1:num_runs
    a = arrayfun(@(x)(1:x), diff(find([1,diff(in(:,1).'),1])), 'uni', 0); %//'
    out = full(sparse([a{:}], in(:,1), in(:,2)));
end
toc
clear a out

Results

------------- With HISTC + BSXFUN Masking approach
Elapsed time is 0.598359 seconds.
------------- With MODE + ACCUMARRAY approach
Elapsed time is 2.452778 seconds.
------------- With HISTC + ACCUMARRAY approach
Elapsed time is 2.579482 seconds.
------------- With ARRAYFUN + Sparse approach
Elapsed time is 1.455362 seconds.


回答3:

slightly better, but still uses a loop :(

out=zeros(4,3);%set to zero matrix
for i = 1:max(in(:,1)); %find max in column 1, and loop for that number
ind = find(in(:,1)==i); % 
out(1: size(in(ind,2),1),i)= in(ind,2);
end

don't know if you can avoid the loop...