Efficient search for permutations that contain sub

2019-02-14 14:28发布

问题:

I have a set of integers, say S = {1,...,10}, and two matrices N and M, whose rows are some (but not necessarily all possible) permutations of elements from S of orders, say, 3 and 5 respectively, e.g. N = [1 2 3; 2 5 3;...], M = [1 2 3 4 5; 2 4 7 8 1;...].

A sub-permutation Q of a permutation P is just an indexed subset of P such that the order of the indices of the elements of Q is the same as the order of their indices in P. Example: [2,4,7] is a sub-permutation of [2,3,4,6,7,1], but [1,2,3] is not a sub-permutation of the latter.

I need an efficient way (e.g. as vectorized as possible and as small for-loops as possible) to find

(1) all permutations from M which have sub-permutations from N

and

(2) how many times each sub-permutation from N is to be found in M.

So far, what I have is a vectorized code that checks if a given single sub-permutation is contained in M (and how many times), but then I have to use a parfor-loop through N, which gets slow for very big N-s. Note that if N is not too big, one could also attack the problem by simply constructing the admissible 5-tuples from the given 3-tuples and compare the result to M, but that can quickly become much slower than simple brute-forcing if N is sufficiently big.

An alternative way to view the problem is as follows: check if N modulo permutations of its rows is a sub-matrix of M in a general sense, i.e. if it is possible to obtain a permutation of the rows of N by deleting elements from M.

Apologies if my question is too elementary, my background is from arithmetic algebraic geometry and representation theory and I am very new to MATLAB.

EDIT: Here is my code for checking presence of a single k-tuple in M: [code]

function [A,f] = my_function(x,M)
%// returns all rows in M that contain x and the absolute frequency of x in M
%// suboptimal for checking combinations rather than permutations byy at least ~ 50%
k = size(x,2);
m = size(M,1);
R = zeros(m,k);
I = R;
Z = I;
    for j = 1:k   
        [R(:,j),I(:,j)] = max((M == x(j)),[],2); 
        Z(:,j) = R(:,j).*I(:,j);
    end
z = zeros(m,k-1);
    for j = 1:(k-1)
        z(:,j) = (Z(:,j) > 0 & Z(:,j) < Z(:,j+1)); 
    end
[v,~] = find(sum(z,2) == k-1);    
A = M(v,:);
f = length(v);
end

Using this function, checking through N is then just a simple matter of a (par)for-loop, which I was hoping to avoid in favor of a faster vectorized solution.

回答1:

Approach #1

[val,ind] = max(bsxfun(@eq,permute(M,[4 2 1 3]),permute(N,[2 3 4 1])),[],2)
matches = squeeze(all(diff(ind,1)>0,1).*all(val,1))
out1 = any(matches,2) %// Solution - 1
out2 = sum(matches,1) %// Solution - 2

Approach #2

Another approach that avoids permuting N and might be better for longish N -

[val,ind] = max(bsxfun(@eq,N,permute(M,[3 4 1 2])),[],4)
matches = squeeze(all(diff(ind,[],2)>0,2).*all(val,2))
out1 = any(matches,1) %// Solution - 1
out2 = sum(matches,2) %// Solution - 2

Approach #3

Memory-scroogey approach for large datasizes -

out1 = false(size(M,1),1);  %// Storage for Solution - 1
out2 = zeros(size(N,1),1);  %// Storage for Solution - 2
for k=1:size(N,1)
    [val3,ind3] = max(bsxfun(@eq,N(k,:),permute(M,[1 3 2])),[],3);
    matches = all(diff(ind3,[],2)>0,2).*all(val3,2);
    out1 = or(out1,matches);
    out2(k) = sum(matches);
end

Approach #4

Memory-scroogey approach for GPU -

gM = gpuArray(M);
gN = gpuArray(N);

gout1 = false(size(gM,1),1,'gpuArray');  %// GPU Storage for Solution - 1
gout2 = zeros(size(gN,1),1,'gpuArray');  %// GPU Storage for Solution - 2
for k=1:size(gN,1)
    [val3,ind3] = max(bsxfun(@eq,gN(k,:),permute(gM,[1 3 2])),[],3);
    matches = all(diff(ind3,[],2)>0,2).*all(val3,2);
    gout1 = or(gout1,matches);
    gout2(k) = sum(matches);
end
out1 = gather(gout1);  %// Solution - 1
out2 = gather(gout2);  %// Solution - 2

Now, this GPU approach has blown away all other approaches. It was run with M : 320000X5 and N : 2100X3 (same as your input sizes) filled with random integer numbers. With a GTX 750 Ti, it took just 13.867873 seconds!! So if you have a GPU with sufficient memory, this might be your winner approach too.


Approach #5

Extremely-memory-scroogey approach for GPU -

gM = gpuArray(M);
gN = gpuArray(N);

gout1 = false(size(gM,1),1,'gpuArray');  %// GPU Storage for Solution - 1
gout2 = zeros(size(gN,1),1,'gpuArray');  %// GPU Storage for Solution - 2
for k=1:size(gN,1)
    [val2,ind2] = max(bsxfun(@eq,gM,permute(gN(k,:),[1 3 2])),[],2);
    matches = all(diff(ind2,[],3)>0,3).*all(val2,3);
    gout1 = or(gout1,matches);
    gout2(k) = sum(matches);
end
out1 = gather(gout1);  %// Solution - 1
out2 = gather(gout2);  %// Solution - 2


回答2:

How about this?

n = size(N,1);
m = size(M,1);
p = size(N,2);
pattern = (1:p).'; %'// will be used for checking if it's a subpermutation or not
result = false(m,n); %// preallocate result, and initiallize to 0
for k = 1:size(N,1) %// loop over columns of (transposed) N
    [~, loc] = ismember(M, N(k,:)); %// entries of M equal to a value of N(:,k)
    ind = find(sum(loc>0,2)==p); %// we need p matches per row of M
    t = reshape(nonzeros(loc(ind,:).'),p,[]); %'// for those rows, take matches
    ind2 = all(bsxfun(@eq,t,pattern)); %// ... see if they are in the right order
    result(ind(ind2),k) = true; %// ... and if so, take note in result matrix
end

The result matrix contains 1 at position r,s if the s-th row N is a sub-permutation of the r-th row of M. From this, your desired results are

result1 = any(result,2);
result2 = sum(result,1);

Example:

M =
     8     9     4     1    10
     6     5     2     7     8
     4     1     9     2    10
     3     4     5     1     2

N =
     4     1     2
     4     9    10
     3     5     9

give

result =
     0     0     0
     0     0     0
     1     1     0
     1     0     0

result1 =
     0
     0
     1
     1

result2 =
     2     1     0


回答3:

I benchmarked all the approaches against different pairs of matrices N,M, and wherever possible, I have also compared parfor vs. for and picked the faster one. Here are my results:

    %//Test 1: size(N) = 2263x3, size(M) = 5000x6

    %//My approach (parfor): 0.650626 sec
    %//Divakar's Approach 1: 1.870144 sec
    %//Divakar's Approach 2: 1.164088 sec
    %//Divakar's Approach 3: 0.380915 sec (with parfor)
    %//Divakar's Approach 4: 2.643659 sec (gpu)
    %//Luis Mendo's Approach: 1.681007 sec


    %//Test 2: size(N) = 2263x3, size(M) = 25000x6

    %//My approach (parfor): 6.137823 sec
    %//Divakar's Approach 1: 8.342699 sec
    %//Divakar's Approach 2: 5.784426 sec
    %//Divakar's Approach 3: 2.251888 sec (with parfor)
    %//Divakar's Approach 4: 6.272578 sec (gpu)
    %//Luis Mendo's Approach: 11.514548 sec

    %//Test 3: size(N) = 2100x3, size(M) = 20000x5

    %//My approach (parfor): 3.417432 sec
    %//Divakar's Approach 1: 5.732680 sec
    %//Divakar's Approach 2: 4.107909 sec
    %//Divakar's Approach 3: 1.393052 sec (with parfor)
    %//Divakar's Approach 4: 3.145183 sec (gpu)
    %//Luis Mendo's Approach: 5.668326 sec

    %//Test 4: size(N) = 2100x3, size(M) = 324632x5

    %//Divakar's Approach 3: 54.231878 sec (with parfor)
    %//Divakar's Approach 4: 15.111936 sec (gpu) 

    %//Test 5: size(N) = 2263x3, size(M) = 1000000x6

    %//Divakar's Approach 3: 210.853515 sec (with parfor)
    %//Divakar's Approach 4: 49.529794 sec (gpu) 
    %//Divakar's Approach 5: 49.874444 sec (gpu)

    %//Test 6: size(N) = 2263x3, size(M) = 5000000x6

    %//Divakar's Approach 3: 1137.606244 sec (with parfor)
    %//Divakar's Approach 4: stopped it after 15 min and heavy interrupts/DCPs activity
    %//Divakar's Approach 5: 267.169307 sec

Among the non-gpu approaches, Divakar's Approach 3 was by far the fastest one. Its gpu counterpart started showing its advantages only with large number of rows.