Optimising a Prolog Program (remove duplicates, re

2019-09-07 15:36发布

问题:

I am very inexperienced with Prolog. I have a data set that contains elements and relations in graph that has circularity (quite a lot). There are rules to calculate the summary relation of a path. One of these is: one takes the path, then takes the weakest relation, and that is the one that holds between both ends.

With Elements E1, E2, E3 and
Relations R1/R1c, R2, R3 (strength low to high) and
structure E1-R3-E2, E1-R1-E2, E2-R2-E3, E3-R1-E2

I can make the following minimal example:

% weaker table
isWeaker( r1, r2).
isWeaker( r2, r3).
weaker( X, Y) :- isWeaker( X, Y).
weaker( X, Y) :-
    isWeaker( X, Z),
    weaker( Z, Y).
% 'weakest' is <= not <
weakest( X, X, Y) :- =(X,Y).
weakest( X, X, Y) :- weaker( X, Y).

% All direct relations
isADirectRelation( e1, r1, e2).
isADirectRelation( e1, r3, e2).
isADirectRelation( e2, r2, e3).
isADirectRelation( e3, r1, e2).
isADirectRelation( e1, r3, e4).
isADirectRelation( e4, r2, e3).
isADirectRelation( e1, r1, e4).
isADirectRelation( e3, r1, e4).

% derived relations calculations

% Structural Chains
isADerivedRelation( Source, Relation, Target, Visited) :-
    \+ member( [Source,Relation,Target], Visited),
    weakest( Relation, RelationOne, RelationTwo),
    isARelation( Source, RelationOne, Intermediate, [[Source,Relation,Target]|Visited]),
    isARelation( Intermediate, RelationTwo, Target, [[Source,Relation,Target]|Visited]).

% major workhorse with anti-circularity
isARelation( Source, Relation, Target, Visited) :-
    (isADirectRelation( Source, Relation, Target);
     isADerivedRelation( Source, Relation, Target, Visited)).

The result of isARelation( Source, Relation, Target, []). is

e1,r1,e2
e1,r3,e2
e2,r2,e3
e3,r1,e2
e1,r3,e4
e4,r2,e3
e1,r1,e4
e3,r1,e4

e1,r1,e3
e3,r1,e3
e1,r1,e3 duplicate
e3,r1,e3 duplicate

Missing are

e4,r1,e4
e2,r2,e2

Is it at all possible to solve this in Prolog? Formally, yes, of course, but also with a decent performance?

回答1:

There are many things to be said about this question, so this will be a long and rambling and ultimately unsatisfactory answer. So I might as well start with a pet peeve: Please don't use camelCaseIdentifiers for predicates, we usually use underscore_separated_words instead. I'm not sure why this bugs me in Prolog in particular, but I suspect partly because uppercase letters are syntactically significant.

Moving on, your weakest/3 predicate is broken:

?- weakest(Weakest, r2, r1).
false.

I think you had this right in the first version of your question, but then you removed the third clause of weakest/3 because you thought it caused redundant answers. Needless to say, "efficiency" is useless without correctness. (Also, we usually put the "output" arguments last, not first.)

Part of the reason you get redundant answers is your use of two (indirectly) recursive calls to isARelation/4 in isADerivedRelation/4. What you are computing is something like the transitive closure of the union of "direct" relations. The usual way to express the transitive closure in Prolog is like this:

transitive_closure(A, B) :-
    base_relation(A, B).
transitive_closure(A, C) :-
    base_relation(A, B),
    transitive_closure(B, C).

That is, we first "take a base step", then recurse. If our base relation has pairs a-b, b-c, c-d, then this will find the solution a-d exactly once, as the composition of the base pair a-b and the derived transitive pair b-d. In contrast, if we were to structure the second clause as you did, with two recursive calls to transitive_closure/2, we would get the solution a-d twice: Once as above, but also once because we would derive the transitive pair a-c and compose it with c-d to give a-d.

You can fix this by changing your first isARelation/4 call in isADerivedRelation/4 into a call to isADirectRelation/3.

Another problem is that you are using Visited wrong: You are marking the pair Source-Target as visited before you have proved that such a solution even exists! You should probably mark Source-Intermediate as visited instead.

Even so, you will still get redundant solutions for a pair of elements if there are several different paths between those elements in the graph. This is just how Prolog's logic works: Prolog finds individual answers to your query but does not allow you to talk directly about relationships between those answers. If we want to force it to enumerate everything exactly once, we must leave pure logic.

Some Prolog systems offer a feature called "tabling" which essentially caches all the solutions for a "tabled" predicate and avoids re-computations. This should avoid redundant answers and even simplify your definition: If your closure relation is tabled, you no longer need to track a Visited list because cyclic recomputations will be avoided by the tabling. I can't give you tested code because I have no Prolog with tabling lying around. Even without tabling offered by your system, there is the theoretical possibility of "memoizing" solutions yourself, using Prolog's impure database. It's difficult to get it exactly right with no redundant solutions whatsoever.

As an alternative to impure Prolog, your problem seems a better fit to datalog or answer-set-programming. These are programming models that use Prolog-like syntax but with set semantics that seems to be exactly what you want: A proposition is either a solution or not, there is no concept of redundant solutions. The entire set of solutions is computed in one go. Cycle elimination is also automatic, so you don't need (in fact, because of the restricted input language, cannot use) a Visited list. If I were you, I would try to do this in Datalog.

As a further Prolog extension, there might be a spiffy solution based on Constraint Handling Rules (CHR). But really, do try Datalog.

Finally, I don't see why you think that e2,r2,e2 is a missing solution. The only path from e2 to e2 that I see goes through e3 and back to e2 via an r1 relation, which is the weakest one, so the solution should be e2,r1,e2.



回答2:

What I ended up with, thanks to also the comments by Lurker and answer by Isabelle is this:

% weaker table
isWeaker( r1, r2).
isWeaker( r2, r3).
weaker( X, Y) :- isWeaker( X, Y).
weaker( X, Y) :-
    isWeaker( X, Z),
    weaker( Z, Y).
% 'weakest' is <= not <
weakest( X, X, Y) :- =(X,Y).
weakest( X, X, Y) :- weaker( X, Y).

% All direct relations
isADirectRelation( e1, r1, e2).
isADirectRelation( e1, r3, e2).
isADirectRelation( e2, r2, e3).
isADirectRelation( e3, r1, e2).
isADirectRelation( e1, r3, e4).
isADirectRelation( e4, r2, e3).
isADirectRelation( e1, r1, e4).
isADirectRelation( e3, r1, e4).

% derived relations calculations

isARelation( Source, Relation, Target, _) :-
    isADirectRelation( Source, Relation, Target).

% Structural Chains
isARelation( Source, Relation, Target, Visited) :-
    \+ member( [Source,Relation,Target], Visited),
    weakest( Relation, RelationOne, RelationTwo),
    isADirectRelation( Source, RelationOne, Intermediate),
    isARelation( Intermediate, RelationTwo, Target, [[Source,RelationOne,Intermediate]|Visited]).
isARelation( Source, Relation, Target, Visited) :-
    \+ member( [Source,Relation,Target], Visited),
    weakest( Relation, RelationOne, RelationTwo),
    isADirectRelation( Source, RelationTwo, Intermediate),
    isARelation( Intermediate, RelationOne, Target, [[Source,RelationTwo,Intermediate]|Visited]).

write_relation( Result) :-
    write( Result), nl.

writeAllRelations :-
   setof( (Relation, Source, Target), Relation^isARelation( Source, Relation, Target, []), ListOfAllRelations),
% maplist( write_relation, ListOfAllRelations). % For SWIProlog
write( ListOfAllRelations). % for JIProlog

This works and produces he right outcome:

r1,e1,e2
r1,e1,e3
r1,e1,e4
r1,e2,e2
r1,e2,e3
r1,e2,e4
r1,e3,e2
r1,e3,e3
r1,e3,e4
r1,e4,e2
r1,e4,e3
r1,e4,e4
r2,e1,e3
r2,e2,e3
r2,e4,e3
r3,e1,e2
r3,e1,e4

However, in the real world, with 60 or so entities and 800 or so direct relations, I've not found a Prolog that can handle it. I'll look into Datalog.



标签: prolog