Following the example given in the chapter GeneralRec of Chlipala book, I'm trying to write the mergesort algorithm.
Here is my code
Require Import Nat.
Fixpoint insert (x:nat) (l: list nat) : list nat :=
match l with
| nil => x::nil
| y::l' => if leb x y then
x::l
else
y::(insert x l')
end.
Fixpoint merge (l1 l2 : list nat) : list nat :=
match l1 with
| nil => l2
| x::l1' => insert x (merge l1' l2)
end.
Fixpoint split (l : list nat) : list nat * list nat :=
match l with
| nil => (nil,nil)
| x::nil => (x::nil,nil)
| x::y::l' =>
let (ll,lr) := split l' in
(x::ll,y::lr)
end.
Definition lengthOrder (l1 l2 : list nat) :=
length l1 < length l2.
Theorem lengthOrder_wf : well_founded lengthOrder.
Admitted.
The problem is that it is not possible to write the mergeSort function with the command Fixpoint
since the function is not structurally decreasing :
Fixpoint mergeSort (l: list nat) : list nat :=
if leb (length l) 1 then l
else
let (ll,lr) := split l in
merge (mergeSort ll) (mergeSort lr).
Instead, one can use the command Program Fixpoint
or Definition
with the term Fix
(as in Chlipala book).
However, if I'm writing this
Definition mergeSort : list nat -> list nat.
refine (Fix lengthOrder_wf (fun (l: list nat) => list nat)
(fun (l : list nat) => (fun mergeSort : (forall ls : list nat, lengthOrder ls l -> list nat )=>
if leb (length l) 1 then
let (ll,lr) := split l in
merge (mergeSort ll _) (mergeSort lr _)
else
l))).
I'm getting impossible goals :
2 subgoals, subgoal 1 (ID 65)
l : list nat
mergeSort : forall ls : list nat, lengthOrder ls l -> list nat
ll, lr : list nat
============================
lengthOrder ll l
subgoal 2 (ID 66) is:
lengthOrder lr l
That is why Chlipala suggests to change the definition of mergeSort this way:
Definition mergeSort : list nat -> list nat.
refine (Fix lengthOrder_wf (fun _ => list nat)
(fun (ls : list nat)
(mergeSort : forall ls' : list nat, lengthOrder ls' ls -> list nat) =>
if Compare_dec.le_lt_dec 2 (length ls)
then let lss := split ls in
merge (mergeSort (fst lss) _) (mergeSort (snd lss) _)
else ls)).
that generates the following goals:
2 subgoals, subgoal 1 (ID 68)
ls : list nat
mergeSort : forall ls' : list nat, lengthOrder ls' ls -> list nat
l : 2 <= length ls
lss := split ls : list nat * list nat
============================
lengthOrder (fst lss) ls
subgoal 2 (ID 69) is:
lengthOrder (snd lss) ls
This new definition sounds like magic to me. So I wonder:
- Fom the first definition, is it still possible to proof the well-foudness of the function?
- Otherwise why the first definition cannot work?
- How a basic user can go from the first definition to the second easily?
It's easy to see that you need to make two changes in order to get to A. Chlipala's solution.
1) When doing
split
you somehow need to remember thatll
andlr
came from split, otherwise they would be some arbitrary lists, which cannot possibly be shorter than the original listl
.The following piece of code fails to save that kind of information:
and, thus, needs to be replaced with
which keeps what we need.
The failure happens due to Coq's inability to remember that
ll
andlr
come fromsplit l
and that happens becauselet (ll,lr)
is justmatch
in disguise (see the manual, §2.2.3).Recall that the aims of pattern-matching is to (loosely speaking)
Now, observe that
split l
does not occur anywhere in the goal or context before we pattern-match on it. We just arbitrarily introduce it into the definition. That's why pattern-matching doesn't give us anything -- we can't replacesplit l
with its "special case" ((ll,lr)
) in the goal or context, because there is nosplit l
anywhere.There is an alternative way of doing this by using logical equality (
=
):This is analogous to using the
remember
tactic. We've got rid offst
andsnd
, but it is a huge overkill and I wouldn't recommend it.2) Another thing we need to prove is the fact that
ll
andlr
are shorter thanl
when2 <= length l
.Since an
if
-expression is amatch
in disguise as well (it works for any inductive datatype with exactly two constructors), we need some mechanism to remember thatleb 2 (length l) = true
in thethen
branch. Again, since we don't haveleb
anywhere, this information gets lost.There are at least two possible solutions to the problem:
leb 2 (length l)
as an equation (just as we did in the 1st part), orbool
(so it can represent two alternatives), but it should also remember some additional information we need. Then we could pattern-match on the comparison result and extract the information, which, of course, in this case have to be a proof of2 <= length l
.What we need is a type which is able to carry a proof of
m <= n
in the case whenleb m n
returnstrue
and a proof of, say,m > n
otherwise. There is a type in the standard library that does exactly that! It's calledsumbool
:{A} + {B}
is just a notation (syntactic sugar) forsumbool A B
. Just asbool
, it has two constructors, but in addition it remembers a proof of either of two propositionsA
andB
. Its advantage overbool
shows up when you do case analysis on it withif
: you get a proof ofA
in thethen
branch and a proof ofB
in theelse
branch. In other words, you get to use context you saved beforehand, whereasbool
doesn't carry any context (only in the mind of the programmer).And we need exactly that! Well, not in the
else
branch, but we would like to get2 <= length l
in ourthen
branch. So, let us ask Coq if it already has a comparison function with the return type like that:Any of the five results would do, because we need a proof only in one case.
Hence, we can replace
if leb 2 (length l) then ...
withif le_lt_dec 2 (length l) ...
and get2 <= length
in the proof context, which will let us finish the proof.