-->

sort/2, keysort/2 vs. samsort/3, predsort/3

2019-06-14 23:01发布

问题:

ISO-Prolog provides sort/2 and keysort/2 which relies on term order (7.2) often called "standard term order". The common way to sort a list with a different order is to map each element El of that list somehow to a list of pairs XKey-El and then sort that list, and finally project the keys away. As an example consider how keysort/2 can be expressed in terms of sort/2 (See the note for an implementation).

In many situations this approach is much faster than using a generic implementation specific sorting predicate that relies on a user defined order as SWI's predsort(C_3, List, SortedList) or SICStus' samsort(O_2, List, SortedList).

My question boils down to:

Are there cases where a sorting using predsort/3 resp. samsort/3 cannot be replaced by some mapping, sort/2-ing and projecting?1

And for the sake of clarity, better stick to finite, ground terms. For, infinite ground terms do no possess a total, lexicographic order as it would be needed as an extension of the finite case ; and further, it is not clear how the comparison of variables with the case of two different variables being implementation dependent will turn out given 7.2.1 of ISO/IEC 13211-1:1995:

7.2.1 Variable

If X and Y are variables which are not identical then
X term_precedes Y shall be implementation dependent
except that during the creation of a sorted list (7.1.6.5,
8.10.3.1 j) the ordering shall remain constant.

So it is not clear whether predsort/3 would still qualify as a creation of a sorted list. What is clear is that the ordering remains constant during sort/2 and keysort/2.


1 Thanks to @WillNess, this projection should at least include also reverse/2 — or any linear transformation. It also means that results with both duplicates and unique ones can be implemented (similarly to the way keysort/2 is implemented).

回答1:

First off, you can "negate" Prolog atoms. Let's call it atom_neg/2 (it is a silly name, but it does something silly anyway):

atom_neg(A, NK) :-
    atom_codes(A, Cs),
    maplist(negate, Cs, NCs),
    append(NCs, [0], NK).

negate(X, N) :- N is -X.

I am not claiming it is practical to do this, but apparently, it is possible.

A total ordering is a weak ordering, and a key function f on a set T together with a total ordering r on the codomain of f, define a weak ordering wr(x, y) <==> r(f(x), f(y)).

(Codomain of a function in that context is the domain of the values that the function returns.)

I might be totally wrong, but the existence of a relation requires the existence of a key: you can define a relation in terms of another relation, but eventually you must compare something that can exist in isolation as well: the key.

The point here is that the key does not need to be in the same domain as the thing we want to sort, and that a weak ordering (a relation) is defined for objects of the same domain. Prolog does something weird here: it defines a standard order of terms for all possible terms. Prolog also does not have a proper notion of "types", or "domains". My gut feeling tells me that sorting things that do not belong to the same domain is simply not very useful, but Prolog obviously disagrees.

You can not define a key function for two cases:

  1. The comparison predicate keeps its own state;
  2. You have "opaque" objects (defined in C, for example) that provide the comparison function but not a key function.

Either way, predsort can be useful: no one would prefer atom_neg/2 over the solution by Will Ness. However, it has one serious defficiency at the moment: it does not allow for a stable sort. The SWI-Prolog already can be used in this way, all it takes would be to add the current behavour to the specification and documentation of predsort/3.



回答2:

edit: the answer by @Boris shows how to "negate" atoms for the purposes of comparison, so it invalidates my contr-argument in that case. And the new stipulation in the question invalidates it entirely.

In case of complex sorting criteria, multiple sub-keys will have to be arranged. If retention of duplicates is desired, incrementing indices should be prefixed to the original term, following the sorting subkeys, in terms constructed for sort/2 to sort.

The complexity and number of the constructed subkeys can get out of hand though. Imaging sorting points by X first, then by Y, in ascending or descending orders in some regions, and by Y first, and by X second, in others.

Then the sought for advantage of replacing the loglinear number of (presumably computationally heavy) comparisons with only a linear number of key constructions and a loglinear number of (presumably light) comparisons in standard order of terms, can disappear.


Trivially, predsort/3ing e.g. a list of atoms in reverse, with custom comparison predicate

comp(<,A,B):- B @< A.

etc., can't be done by sort/2ing which works in "standard order of terms" (quoting SWI documentation). With numbers, we could flip the sign, but not with names.

Perhaps you'd want to add reverse to the allowed actions.

With sort/4 allowed, I don't see anything that wouldn't work. And since it's stable, secondary criteria can be accommodated as well, by the secondary passes (first by a minor, then by a major criterion).



回答3:

I think I might have a proper answer to your question.

If you have a partial ordering, you can still try and sort using predsort/3, and you might get a better result than just saying "a total ordering does not exist."

Here is an example: say you have a game played by two teams. Each play gives a point to only one of the teams, and you play until one team reaches a certain number of points.

Now, you organize a tournament, and it has a group stage, in groups of 4 teams, which is a round-robin. Only the two top teams make it out of the groups.

For each game played, teams get a score of own_points - other_teams_points. In other words, if you play to 7, and the final score is:

Team A - 5:7 - Team B

them Team A scores -2 and Team B scores 2.

At the end of the group stage, you order the teams by:

  1. Total score
  2. If the total score is the same, the team that won the direct fight is ordered higher.

Most notably, using this scoring scheme, you cannot resolve a three-way draw if Team A beat Team B, Team B beat Team C, and Team C beat Team A. A "stable" sort makes no sense in this context.

However, using predsort/3, you can attempt to find the two top teams, and you will get a definitive answer in most cases. Resolving a three-way draw as above is usually resolved using a coin-toss.