Extended find of unique tuples in a relation repre

2019-05-02 07:16发布

问题:

Consider {<1,2>, <1,3>, <1,7>, <0,4>} as the set of tuples of a relation R. Now consider that R is represented (via its membership function) by a BDD. That is, The BDD representing R depends on variables {x1,x2, y1, y2, y3} where {x1, x2} are used to represent the first element of every tuple and {y1, y2, y3} are used to represent the second element.

Now, consider the problem of finding the set of tuples that have unique values in its first element. For the relation above that set would be {<0,4>}. All the other elements are discarded as they are more than one value having 1 in the first component.

As a second example consider the relation with set of tuples {<1,2>, <1,3>, <1,7>, <2,3>, <2,5>, <0,4>}. In such a case the expected result is still {<0,4>} as 2 appears more than once as first element.

The problem can be also seen as abstracting away the variables {y1,y2,y3} such that only unique values for {x1,x2} remain. With this result, the expected relation can be reconstructed by computing the conjunction of the resulting BDD with the input one.

In summary, the the question is: which are the BDD operations that have to be performed on The representation of R to obtain the BDD with only the unique tuples.

Notice that this is a genralization of this question

EDIT 1: The following code reflects the implementation I have so far. However, I am wondering if it is possible to get a more efficient version. For simplicity I intentionally omit the handling of the computed table (crucial to get better time complexity). Additionally, I use &, | and ! to denote the conjunction, disjunction and complement operations on BDDs.

BDD uniqueAbstract(BDD f, BDD cube) {
  if ((f.IsZero() || f.IsOne()) && !cube.IsOne())
    return zero();
  BDD T = high(f);
  BDD E = low(f);
  if(level(f) == level(c)) { // current var is abstracted
    BDD uniqueThen = uniqueAbstract(T, high(c));
    BDD existElse = existAbstract(E, high(c));

    BDD existThen = existAbstract(T, high(c));
    BDD uniqueElse = uniqueAbstract(E, high(c));

    return (uniqueThen & !existElse) | (uniqueElse & !existThen)
  } else {
    BDD uniqueThen = uniqueAbstract(T,c);
    BDD uniqueElse = uniqueAbstract(E,c);
    return ite(top(f), uniqueThen, uniqueElse);
  }
}

EDIT2: After trying three different implementations there are still some performance issues. Let me describe the three of them.

  1. A C implementation of my approach, let me call it the reference implementation4.
  2. The implementation proposed by user meolic in the accepted answer3.
  3. A hybrid approach between the two and available2.

The goal of this update is to analyze a bit the results from using the three approaches. As time measures seem misleading at this time to judge them, I decided to evaluate the implementations on a different set of measures.

  • Recursive calls
  • Cache hits
  • Abstract simple. Number of times the function call was solved without requiring existential abstraction.
  • Abstract complex: Number of times the function call was solved requiring existential abstraction.
  • Exist abstract: Number of calls to the existential abstraction.

The results for implementation 1: (21123 ms): Unique abstraction statistics: Recursive calls: 1728549.000000 Cache hits: 638745.000000 Non abstract: 67207.000000 Abstract simple: 0.000000 Abstract complex: 0.000000 Exist abstract: 1593430.000000

Results for implementation 2: (run time: 54727 ms) Unique abstraction statistics: Recursive calls: 191585.000000 Cache hits: 26494.000000 Abstract simple: 59788.000000 Abstract complex: 12011.000000 Exist abstract: 24022.000000

Results for implementation 3: (run time: 20215 ms) Unique abstraction statistics: Recursive calls: 268044.000000 Cache hits: 30668.000000 Abstract simple: 78115.000000 Abstract complex: 46473.000000 Exist abstract: 92946.000000

EDIT 3: The following results were obtained after implementing every logical operation in terms of ITE5.

  1. uniqueAbstractRecRef (21831 ms) Unique abstraction statistics: Total calls: 1723239 Optimized calls: 0 Total exist abstract calls: 30955618 Unique abstract calls to exist abstract: 2385915 Total ite calls: 3574555 Out of the total time, uniqueAbstractRecRef takes 4001 ms (12.4%)

  2. uniqueAbstractSERec (56761 ms) Unique abstraction statistics: Total calls: 193627 Optimized calls: 60632 Total exist abstract calls: 16475806 Unique abstract calls to exist abstract: 24304 Total ite calls: 1271844 Out of the total time, uniqueAbstractSERec takes 33918 ms (51.5%)

  3. uniqueAbstractRec (20587 ms) Unique abstraction statistics: Total calls: 270205 Optimized calls: 78486 Total exist abstract calls: 13186348 Unique abstract calls to exist abstract: 93060 Total ite calls: 1256872 Out of the total time, uniqueAbstractRec takes 3354 ms (10.6%)

回答1:

There exist simple and efficient solution if variables are ordered in such a way that x1 and x2 are at the top of BDD.

Consider BDD for second example.

You can traverse (in breadth-first order) first two layers of it to get four sub-BDDs. One for each possible combination of x1,x2. Three of those sub-BDDs a rooted at y1 and fourth is empty (constant False).

Now you can count number of elements in each sub-BDD (Algorithm C from Knuth's Volume 4 Fascicle 1, Bitwise Tricks & Techniques; Binary Decision Diagrams).

If number of elements in sub-BDD is greater than 1 then drop it (shortcut from parent node directly to False), otherwise leave it as it is.

It is possible to run this algorithm in single pass by memoizing partial results while counting elements.



回答2:

Here is my implementation. I have studied author's proposed solution and it seems to me that it is the best if not the only simple BDD-based solution for arbitrary ordering. However, there may be some improvements if the algorithm is implemented in my way- PLEASE CHECK. I am using my own wrapper over BDD package but you should not have any troubles to understand it.

EDITED: I have simplified the solution, function Bdd_GetVariableChar() is not used anymore.

/* TESTING SOLUTION FOR QUESTION ON STACK OVERFLOW */
/* bdd_termFalse,bdd_termTrue: Boolean constants */
/* Bdd_isTerminal(f): check if f is Boolean constant */
/* Bdd_Low(f),Bdd_High(f): 'else' and 'then' subfunction */
/* Bdd_Top(f): literal function representing topvar of f */
/* Bdd_IsSmaller(f,g): check if topvar of f is above topvar of g */
/* existentialAbstraction(f,cube): \exist v.f for all v in cube */

Bdd_Edge specialAbstraction(Bdd_Edge f, Bdd_Edge cube) {
  if (Bdd_isTerminal(cube)) return f;
  if (Bdd_isTerminal(f)) return bdd_termFalse;
  if (Bdd_IsSmaller(f,cube)) {
    Bdd_Edge E,T;
    E = specialAbstraction(Bdd_Low(f),cube);
    T = specialAbstraction(Bdd_High(f),cube);
    return Bdd_ITE(Bdd_Top(f),T,E);
  } else if (Bdd_IsSmaller(cube,f)) {
    return bdd_termFalse;
  } else {
    Bdd_Edge E,T;
    cube = Bdd_High(cube);
    E = Bdd_Low(f);
    T = Bdd_High(f);
    if (Bdd_isEqv(E,bdd_termFalse)) {
      return specialAbstraction(T,cube);
    } else if (Bdd_isEqv(T,bdd_termFalse)) {
      return specialAbstraction(E,cube);
    } else {
      Bdd_Edge EX,TX,R;
      EX = existentialAbstraction(E,cube);
      TX = existentialAbstraction(T,cube);
      if (Bdd_isEqv(EX,TX)) return bdd_termFalse;
      R = Bdd_ITE(Bdd_ITE(EX,bdd_termFalse,T),
                  bdd_termTrue,
                  Bdd_ITE(TX,bdd_termFalse,E));
      return specialAbstraction(R,cube);
    }
  }
}

And, yes, if variable ordering is fixed with x above y, the algorithm can really be much more efficient - you can remove all the calculations from the most complex 'else' block and just return 0.

Here are some testing runs:

CUBE (JUST IN CASE YOU ARE NOT FAMILIAR WITH BDD ALGORITHMS)
  +  y1 y2 y3 y4 y5

ORIGINAL (ORDERED WITH X ABOVE Y)
  +  *x1 *x2 x3 *x4 x5 y1 *y2 y3 y4 y5
  +  *x1 x2 *x3 *x4 *x5 y1 y2 *y3 y4 y5
  +  *x1 x2 *x3 *x4 x5 *y1 y2 *y3 y4 y5
  +  *x1 x2 *x3 x4 *x5 y1 *y2 y3 *y4 *y5
  +  *x1 x2 x3 *x4 x5 *y1 *y2 *y3 *y4 y5
  +  *x1 x2 x3 *x4 x5 *y1 y2 y3 *y4 *y5
  +  x1 *x2 *x3 *x4 *x5 y1 y2 y3 y4 *y5
  +  x1 x2 *x3 x4 x5 *y1 *y2 *y4 *y5
  +  x1 x2 x3 *x4 *x5 *y1 *y2 *y3 y4 *y5

ABSTRACTION
  +  *x1 *x2 x3 *x4 x5
  +  *x1 x2 *x3 *x4
  +  *x1 x2 *x3 x4 *x5
  +  x1 *x2 *x3 *x4 *x5
  +  x1 x2 x3 *x4 *x5

ORIGINAL (ORDERED WITH Y ABOVE X)
  +  *y1 *y2 *y3 *y4 *y5 x1 x2 *x3 x4 x5
  +  *y1 *y2 *y3 *y4 y5 *x1 x2 x3 *x4 x5
  +  *y1 *y2 *y3 y4 *y5 x1 x2 x3 *x4 *x5
  +  *y1 *y2 y3 *y4 *y5 x1 x2 *x3 x4 x5
  +  *y1 y2 *y3 y4 y5 *x1 x2 *x3 *x4 x5
  +  *y1 y2 y3 *y4 *y5 *x1 x2 x3 *x4 x5
  +  y1 *y2 y3 *y4 *y5 *x1 x2 *x3 x4 *x5
  +  y1 *y2 y3 y4 y5 *x1 *x2 x3 *x4 x5
  +  y1 y2 *y3 y4 y5 *x1 x2 *x3 *x4 *x5
  +  y1 y2 y3 y4 *y5 x1 *x2 *x3 *x4 *x5

ABSTRACTION
  +  *x1 *x2 x3 *x4 x5
  +  *x1 x2 *x3 *x4
  +  *x1 x2 *x3 x4 *x5
  +  x1 *x2 *x3 *x4 *x5
  +  x1 x2 x3 *x4 *x5

ORIGINAL (MIXED ORDER)
  +  *x1 *x2 y1 *y2 y3 y4 y5 x3 *x4 x5
  +  *x1 x2 *y1 *y2 *y3 *y4 y5 x3 *x4 x5
  +  *x1 x2 *y1 y2 *y3 y4 y5 *x3 *x4 x5
  +  *x1 x2 *y1 y2 y3 *y4 *y5 x3 *x4 x5
  +  *x1 x2 y1 *y2 y3 *y4 *y5 *x3 x4 *x5
  +  *x1 x2 y1 y2 *y3 y4 y5 *x3 *x4 *x5
  +  x1 *x2 y1 y2 y3 y4 *y5 *x3 *x4 *x5
  +  x1 x2 *y1 *y2 *y3 *y4 *y5 *x3 x4 x5
  +  x1 x2 *y1 *y2 *y3 y4 *y5 x3 *x4 *x5
  +  x1 x2 *y1 *y2 y3 *y4 *y5 *x3 x4 x5

ABSTRACTION
  +  *x1 *x2 x3 *x4 x5
  +  *x1 x2 *x3 *x4
  +  *x1 x2 *x3 x4 *x5
  +  x1 *x2 *x3 *x4 *x5
  +  x1 x2 x3 *x4 *x5


回答3:

One approach involves directly translating the definition of uniqueness:

R(x,y) and forall z . ~R(x,z) or y = z

An implementation may look like this:

def inspect(function, name, nvars):
    sys.stdout.write(name) # avoid print's trailing space or newline
    function.summary(nvars) # minterm count needs number of variables
    function.printCover()

import sys
from cudd import Cudd
m = Cudd()

nx = 2
ny = 3
x = [m.bddVar() for i in range(nx)]
y = [m.bddVar() for i in range(ny)]

R = (~x[0] & x[1] & (~y[0] & y[1] | y[1] & y[2]) |
     x[0] & ~x[1] & (y[0] ^ y[1]) & y[2] |
     ~x[0] & ~x[1] & y[0] & ~y[1] & ~y[2])

# This approach is independent of variable order.  We are free to enable
# reordering or call it explicitly.
m.reduceHeap()

inspect(R, 'R', nx+ny)

# Create auxiliary variables and selector function.
z = [m.bddVar() for i in range(ny)]
zcube = reduce(lambda a, b: a & b, z)
P = ~m.xeqy(y,z)

# A pair is in L iff:
#   - it is in R
#   - there is no other pair in R with the same x and different y
L = R & ~(R.swapVariables(y,z).andAbstract(P,zcube))

inspect(L, 'L', nx+ny)

Result of running that code:

R: 10 nodes 1 leaves 6 minterms
01-11 1
10101 1
10011 1
00100 1
0101- 1

L: 6 nodes 1 leaves 1 minterms
00100--- 1

The first two variables encode the first element of the pair; the next three variables encode the second element of the pair; the last three variables are the auxiliary variables.

The code applies DeMorgan's to the formula above to make use of andAbstract.