Finding all non-conflicting combinations of values

2019-02-07 10:05发布

问题:

I have the following array which contains arrays of values:

$array = array(
    array('1', '2'),
    array('a', 'b', 'c'),
    array('x', 'y'),
);

There can be any number of arrays and an array can contain any number of values. I currently have a piece of code which will generate all combinations where one value is taken from each array. eg:

1ax, 1ay, 1bx, 1by, 1cx, 1cy, 2ax, 2ay, 2bx, 2by, 2cx, 2cy

However what I actually want is only the combinations where only one value sits in each column, ie. 1ax is no good because all three values 1, a and x sit in the first column, 1by is no good because b and y sit in the second column. So from the above example only these combinations would be valid:

1cy, 2cx

I originally planned to just generate all combinations and then filter out the ones with conflicts, but that doesn't scale as this is an oversimplified example, in the real application there's going to be situations where there are potentially millions of combinations (including conflicting ones).

Can anyone help with a better way to solve this? I'm working in PHP, but any code sample that clearly demonstrates the logic would be helpful.

Thanks in advance.


Update:

I've tested the solutions that work against a bigger dataset, to get some benchmarks, these are the results so far:

$array = array(
    array('1', '2', '3', '1', '2', '3', '1', '2', '3', '1', '2', '3', '1', '2', '3'),
    array('a', 'b', 'c', 'd', 'a', 'b', 'c', 'd', 'a', 'b', 'c', 'd', 'a', 'b', 'c', 'd', 'a', 'b', 'c', 'd'),
    array('x', 'y', 'z', 'x', 'y', 'z', 'x', 'y', 'z'),
    array('1', '2', '3', '1', '2', '3', '1', '2', '3'),
    array('a', 'b', 'c', 'd', 'a', 'b', 'c', 'd', 'a', 'b', 'c', 'd'),
    array('x', 'y', 'z'),
);

Josh Davis 2nd Solution:

Combinations:      249480
Time:              0.3180251121521 secs
Memory Usage:      22.012168884277 mb
Peak Memory Usage: 22.03059387207 mb

Josh Davis:

Combinations:      249480
Time:              1.1172790527344 secs
Memory Usage:      22.004837036133 mb
Peak Memory Usage: 22.017387390137 mb

Tom Haigh:

Combinations:      249480
Time:              5.7098741531372 secs
Memory Usage:      39.145843505859 mb
Peak Memory Usage: 39.145843505859 mb

回答1:

This is one of those cases where self-generated code and brute force will beat most algorithms on simplicity and performance. In previous answers I've seen lots of recursion, array manipulation and computations, when in actuality what you'd want to do is:

foreach ($array[0] as $k0 => $v0)
{
    foreach ($array[1] as $k1 => $v1)
    {
        if ($k1 == $k0)
        {
            continue;
        }
        foreach ($array[2] as $k2 => $v2)
        {
            if ($k2 == $k1 || $k2 == $k0)
            {
                continue;
            }
            $result[] = $v0.$v1.$v2;
        }
    }
}

Of course, you cannot write this unless you know how many arrays are in $array. That's where generated code comes handy:

$array = array(
    array('1', '2'),
    array('a', 'b', 'c'),
    array('x', 'y')
);
$result = array();

$php = '';
foreach ($array as $i => $arr)
{
    $php .= 'foreach ($array[' . $i . '] as $k' . $i . ' => $v' . $i . '){';

    if ($i > 0)
    {
        $sep  = 'if (';
        $j    = $i - 1;
        do
        {
            $php .= $sep . '$k' . $i . ' == $k' . $j;
            $sep  = ' || ';
        }
        while (--$j >= 0);

        $php .= ') { continue; } ';
    }
}

$php .= '$result[] = $v' . implode('.$v', array_keys($array)) . ';' . str_repeat('}', count($array));

eval($php);
print_r($result);

Note that this routine assumes that $array is a zero-based numerically indexed array, as in your example. It will generate the code quoted above, adjusted for an arbitrary number of arrays.


Update

Here's an alternative algorithm. It's still self-generated, but less bruteforce. We still have nested loops, except that each loop works on a copy of the array where keys that are currently used by outer loops have been removed from this loop's array. For example, if the values should be (a,b,c) but the outer loops are using the indices 0 and 2, we remove "a" (index 0) and "c" (index 2) and all we have left is "b". It means that loops work only on possible combinations and we don't need a if condition anymore.

Furthermore, and this part can be applied to the previous algorithm as well, we process the arrays of values in order from the smallest to the largest so that we guarantee that used indices exist in current array. The downside is it doesn't generate the combinations in the same order. It generates the same combinations, just not in the same order. The code looks like this:

$a0 = $array[0];
foreach ($a0 as $k0 => $v0)
{
    $a2 = $array[2];
    unset($a2[$k0]);
    foreach ($a2 as $k2 => $v2)
    {
        $a1 = $array[1];
        unset($a1[$k0], $a1[$k2]);
        foreach ($a1 as $k1 => $v1)
        {
            $result[] = "$v0$v1$v2";
        }
    }
}

The above routine sets up a copy of the values at the beginning of every loop, then removes values that are used by outer loops. You can improve this process by setting up a copy of the values only once at the beginning, remove the keys as they are used (at the beginning of each loop) and put them back as they are freed (at the end of each loop). The routine then looks like this:

list($a0,$a1,$a2) = $array;
foreach ($a0 as $k0 => $v0)
{
    unset($a1[$k0], $a2[$k0]);
    foreach ($a2 as $k2 => $v2)
    {
        unset($a1[$k2]);
        foreach ($a1 as $k1 => $v1)
        {
            $result[] = "$v0$v1$v2";
        }
        $a1[$k2] = $array[1][$k2];
    }
    $a1[$k0] = $array[1][$k0];
    $a2[$k0] = $array[2][$k0];
}

The actual code that generates the source above is:

$keys = array_map('count', $array);
arsort($keys);

$inner_keys = array();
foreach ($keys as $k => $cnt)
{
    $keys[$k] = $inner_keys;
    $inner_keys[] = $k;
}

$result = array();

$php = 'list($a' . implode(',$a', array_keys($array)) . ')=$array;';
foreach (array_reverse($keys, true) as $i => $inner_keys)
{
    $php .= 'foreach ($a' . $i . ' as $k' . $i . ' => $v' . $i . '){';

    if ($inner_keys)
    {
        $php .= 'unset($a' . implode('[$k' . $i . '],$a', $inner_keys) . '[$k' . $i . ']);';
    }
}

$php .= '$result[] = "$v' . implode('$v', array_keys($array)) . '";';

foreach ($keys as $i => $inner_keys)
{
    foreach ($inner_keys as $j)
    {
        $php .= '$a' . $j . '[$k' . $i . ']=$array[' . $j . '][$k' . $i . "];\n";
    }
    $php .= '}';
}
eval($php);


回答2:

Interesting problem! This turned out to be more complex than I thought, but it seems to work.

Basic strategy is to first order the arrays smallest to largest (keeping track of what order they were in, so I can output the answers in the correct order).

I keep answers in the form of an array of indexes into this sorted array of input lists.

Now that the lists are sorted, I can store the first correct answer as array(0,1,2,...,n);

Then I recurse into a function for trying all values in the first slot there (the 0 above) by swapping it with other values in that answer array (all the ones that aren't too big for that slot). Since I've got it sorted by size, I can move any value to the right when I'm swapping, without worrying about it being to big for that right slot.

outputting each valid slot has some crazy indirection to undo all the sorting.

Sorry if this looks confusing. I didn't spend much time cleaning it up.

<?php
# $lists is an array of arrays
function noconfcombos($lists) {
    $lengths = array();
    foreach($lists as $list) {
        $lengths[] = count($list);
    }

    # find one solution (and make sure there is one)
    $answer = array();
    $sorted_lengths = $lengths;
    asort($sorted_lengths);
    $answer_order_lists = array();
    $answer_order_lengths = array();
    $output_order = array();
    $min = 1;
    $max_list_length = 0;
    foreach($sorted_lengths as $lists_key => $list_max) {
        if($list_max < $min) {
            # no possible combos
            return array();
        }
        $answer[] = $min - 1; # min-1 is lowest possible value (handing out colums starting with smallest rows)
        $output_order[$lists_key] = $min - 1; # min-1 is which slot in $answers corresponds to this list
        $answer_order_lists[] = $lists[$lists_key];
        $answer_order_lengths[] = $lengths[$lists_key];
        ++$min;
    }
    ksort($output_order);
    $number_of_lists = count($lists);
    $max_list_length = end($sorted_lengths);
    if($max_list_length > $number_of_lists) {
       for($i = $number_of_lists; $i < $max_list_length; ++$i) {
          $answer[] = $i;
       }
       $stop_at = $number_of_lists;
    } else {
       $stop_at = $number_of_lists - 1;
    }

    # now $answer is valid (it has the keys into the arrays in $list for the
    # answer), and we can find the others by swapping around the values in
    # $answer.

    $ret = array();
    $ret[] = noconfcombos_convert($answer, $answer_order_lists, $output_order);
    noconfcombos_recurse($ret, $max_list_length, $stop_at, $answer_order_lengths, $answer_order_lists, $output_order, $answer, 0);

    return $ret;
}

# try swapping in different indexes into position $index, from the positions
# higher, then recurse
function noconfcombos_recurse(&$ret, $max_list_length, $stop_at, &$lengths, &$lists, &$output_order, $answer, $index) {
    if($index < $stop_at) {
        noconfcombos_recurse($ret, $max_list_length, $stop_at, $lengths, $lists, $output_order, $answer, $index + 1);
    }
    for($other = $index + 1; $other < $max_list_length; ++$other) {
        if($answer[$other] < $lengths[$index]) { # && $answer[$index] < $lengths[$other]) {
            $tmp = $answer[$index];
            $answer[$index] = $answer[$other];
            $answer[$other] = $tmp;
            $ret[] = noconfcombos_convert($answer, $lists, $output_order);
            if($index < $stop_at) {
                noconfcombos_recurse($ret, $max_list_length, $stop_at, $lengths, $lists, $output_order, $answer, $index + 1);
            }
        }
    }
}


function noconfcombos_convert(&$indexes, &$lists, &$order) {
    $ret = '';
    foreach($order as $i) {
        $ret .= $lists[$i][$indexes[$i]];
    }
    return $ret;
}

function noconfcombos_test() {
    $a = array('1', '2', '3', '4');
    $b = array('a', 'b', 'c', 'd', 'e');
    $c = array('x', 'y', 'z');
    $all = array($a, $b, $c);
    print_r(noconfcombos($all));
}

noconfcombos_test();


回答3:

I think this works. It is using recursion to go over the structure like a tree. For each branch it keeps track of which columns are already taken. It is probably slow because it is a brute force approach.

<?php 

$array = array(
    array('1', '2'),
    array('a', 'b', 'c'),
    array('x', 'y'),
);


function f($array, & $result, $colsToIgnore = array(), $currentPath = array()) {
    $row = array_shift($array);
    $length = count($row);
    $isMoreRows = !! count($array);

    for ($col = 0; $col < $length; $col++) {
        //check whether column has already been used
        if (in_array($col, $colsToIgnore)) {
            continue;   
        }

        $item = $row[$col];

        $tmpPath = $currentPath;
        $tmpPath[] = $item;

        if ($isMoreRows) {
            $tmpIgnoreCols = $colsToIgnore;
            $tmpIgnoreCols[] = $col;
            f($array, $result, $tmpIgnoreCols, $tmpPath);
        } else {
            $result[] = implode('', $tmpPath);
        }

    }
}


$result = array();
f($array, $result);
print_r($result);


回答4:

probably not the most elegant way, but does the trick (javascript)

var result = [];

for(i=0;i<arr1.length;i++)
{
  for(j=0;j<arr2.length;j++)
  {
    if(j==i)
      continue;
    else
    {
      for(k=0;k<arr3.length;k++)
      {
        if(k==i||k==j)
          continue;
        else
        {
          result.push(arr1[i]+arr2[j]+arr3[k]);
        }
      }
    }
  }
}


回答5:

This can be refactored using recursion making it work with any arbitrary amount of arrays. If I find the time, I'll give it a try myself.

ps. I don't know php, the example is written in Delphi.

Edit: recursive solution with arbitrary # arrays

type
  TSingleArray = array of string;
  TMasterArray = array of TSingleArray;
var
  solutions: array of integer; // Q&D container to hold currently used indexes of SingleArrays


procedure WriteSolution(const masterArray: TMasterArray);
var
  I: Integer;
  indexes: string;
  solution: string;
begin
  for I := 0 to High(solutions) do
  begin
    indexes := indexes + IntToStr(solutions[I]) + ' ';
    solution := solution + masterArray[I][solutions[I]];
  end;
  Writeln('Solution: ' + solution + ' Using indexes: ' + indexes);
end;

procedure FindSolution(const masterArray: TMasterArray; const singleArrayIndex: Integer; var bits: Integer);
var
  I: Integer;
begin
  for I := 0 to High(masterArray[singleArrayIndex]) do
  begin
    //***** Use bit manipulation to check if current index is already in use
    if bits and (1 shl I)  = (1 shl I ) then continue;
    solutions[singleArrayIndex] := I;
    Inc(bits, 1 shl I);
    //***** If it is not the last array in our masterArray, continue by calling RecursArrays recursivly.
    if singleArrayIndex <> High(masterArray) then FindSolution(masterArray, Succ(singleArrayIndex), bits)
    else WriteSolution(masterArray);
    Dec(bits, 1 shl I);
  end;
end;

//***************
// Initialization
//***************
var
  I, J: Integer;
  bits: Integer;
  singleArrayString: string;
  masterArray: TMasterArray;
begin
  I := 10;
  SetLength(masterArray, I);
  for I := 0 to High(masterArray) do
  begin
    SetLength(masterArray[I], High(masterArray) + Succ(I));
    singleArrayString := EmptyStr;
    for J := 0 to High(masterArray[I]) do
    begin
      masterArray[I][J] := IntToStr(J);
      singleArrayString := singleArrayString + masterArray[I][J];
    end;
    WriteLn(singleArrayString)
  end;
  ReadLn;

  //****** Start solving the problem using recursion
  bits := 0;
  SetLength(solutions, Succ(High(masterArray)));
  FindSolution(masterArray, 0, bits);    
end.


回答6:

Look at it from a different angle: in order to compose a result row, you need to pick a value for each column. Each value should be picked from a different source row. The problem is know as "pick N out of M", or more mathematically, a Combination.

This means that a result row corresponds to an array of source-row indices.

You can build all possible pickings by starting to build an index-array like this (pseudo-code)

function combinations( $source ) {
  if( count( $source ) == 0 ) return $source;
  $result=array();
  // build one row
  foreach( $source as $index=>$value ) {
    $newsource = array_splice( $source, $index, 1 );

    $reduced_combinations=combinations( $newsource  );
    foreach( $reduced_combinations as $reduced_combi ) {
      $newrow=array_unshift( $reduced_combi, $value );
      $result[]=$newrow;
    }

  }
  return $result;
}

function retrieve_indices( $indices, $arrays ) {
   $result=array();
   foreach( $indices as $column=>$index ) {
     $result[]=$arrays[$index][$column];
   }
   return $result;
}

$source_arrays = array(
  array( "1", "2", "3" ),
  array( "a", "b", "c" ),
  array( "x", "y", "z" )
);

$index_combinations = combinations( range(0,2) );
$result=array();
foreach( $index_combinations as $combination ) {
  $result[]=retrieve_indices( $combination, $source_arrays );
}


回答7:

Another option:

    $arr = array(
        array('1', '2'),
        array('a', 'b', 'c'),
        array('x', 'y'),
    );
    //-----
    //assuming $arr consists of non empty sub-arrays
    function array_combinations($arr){ 
        $max = 1;
        for ($i = 0; $i < count($arr); $i ++){
            $max *= count($arr[$i]); 
        }
        $matrix = array();
        for ($i = 0; $i < $max; $i ++){
            $matrix = array(); 
        }
        $c_rep = 1;
        for ($i = count($arr) - 1; $i >= 0; $i --){
            $c_rep *= ($i < count($arr) - 1)//last sub-array 
                ? count($arr[$i + 1])
                : 1;
            $k = 0; 
            while ($k < $max){
                for ($t = 0; $t < count($arr[$i]); $t ++){
                    for ($j = 0; $j < $c_rep; $j ++){
                        $matrix[$i][$k ++] = $arr[$i][$t];
                    }
                }   
            }
        }
        return $matrix;
    }
    //-----
    $matrix = array_combinations($arr);


回答8:

Your problem is similar to that of finding a determinant of a matrix. The best way imho would be to fill smaller arrays with some symbol, say '0', so they all have equal number of values, in your example

$array = array(
    array('1', '2', '0'),
    array('a', 'b', 'c'),
    array('x', 'y', '0'),
);

then loop through each of the first array values and for each increment the index of array by 1 and check the next array and the next column (on the first loop it will be '1' and index will be 0 incremented - 1, then get $array1 - 'b' and so on) if you reach '0', break, if you reach right border, reset first index to 0. Then do the same with decrementing and you'll have all the combinations. It is probably unclear, check the image I linked to



回答9:

try this :

function algorithmToCalculateCombinations($n, $elems) {
        if ($n > 0) {
            $tmp_set = array();
            $res = algorithmToCalculateCombinations($n - 1, $elems);
            foreach ($res as $ce) {
                foreach ($elems as $e) {
                    array_push($tmp_set, $ce . $e);
                }
            }
            return $tmp_set;
        } else {
            return array('');
        }
    }

$Elemen = array(range(0,9),range('a','z'));
$Length = 3;
$combinations = algorithmToCalculateCombinations($Length, $Elemen);