I'm having trouble understanding whether it is possible to prove that two sets (in this case regular languages) are identical and thus interchangeable. From what I understand, sets can be equivalent even if they are not constructively equal.
Regular languages are sets of strings, but I don't see how to say that r1 = r2 so that something like symmetry can be used in a proof.
Here is my RegularLanguage declaration:
Inductive RegularLanguage (A : Set) : Set :=
| EmptyLang : RegularLanguage A
| EmptyStr : RegularLanguage A
| Unit : A -> RegularLanguage A
| Union :
RegularLanguage A
-> RegularLanguage A
-> RegularLanguage A
| Concat :
RegularLanguage A
-> RegularLanguage A
-> RegularLanguage A
| KleeneStar : RegularLanguage A -> RegularLanguage A
.
Even though I say the type is Set, this doesn't create a set of strings, as far as I can see. Do I need some function of type RegularLanguage -> Set of strings
?
Thank you for the help.
There are a few misunderstandings about some Coq concepts which I'll try to clarify.
First, in Coq, you shouldn't view
Set
as what we call "set" in traditional mathematics. Instead, you should view it as a type. Coq also hasType
, but for the purposes of this post you can view bothSet
andType
as being interchangeable. To avoid confusion, from now on, I will use "set" to refer to the usual concept of set in traditional mathematics, and "type" to refer to elements ofSet
andType
in Coq.So, what exactly is the difference between a set and a type? Well, in normal mathematics, it makes sense to ask oneself whether anything is a member of any given set. Thus, if we were to develop the theory of regular expressions in normal mathematics, and each regular expression were seen as a set, it would make sense to ask questions such as
0 \in EmptyLang
, because, even though0
is a natural number, it could be the element of any set a priori. As a less contrived example, the empty string is both a member of the full language (i.e. the one that contains all strings) and of the Kleene closure of any base language.In Coq, on the other hand, each valid element of the language has exactly one type. The empty string, for instance, has type
list A
for someA
, which is written[] : list A
. If we try to ask whether[]
belongs to some regular language using the "has type" syntax, we get an error; try typing e.g.Both sets and types can be seen as collections of elements, but types are in a sense more restrictive: for instance, one can take the intersection of two sets, but there's no way of taking the intersection of two types.
Second, when you write
this does not mean that the elements that you list below the header define sets nor types. What this means is that, for each type
A
(the(A : Set)
part), you are defining a new type notedRegularLanguage A
(theRegularLanguage (A : Set) : Set
part), whose elements are freely generated by the list of constructors given. Thus, we havebut we don't have
(once again, you can try typing all the above judgments into Coq to see which are accepted and which aren't).
Being "freely generated" means, in particular, that the constructors you listed are injective and disjoint. As larsr noted previously, in particular, it is not the case that
Union l1 l2 = Union l2 l1
; as a matter of fact, you will usually be able to proveUnion l1 l2 <> Union l2 l1
. The problem is that there is a mismatch between the notion of equality that you get for inductively defined types in Coq (which you can't change) and your intended notion of equality for regular languages.While there are a few ways around this, I think the easiest one is to use the setoid rewrite feature. This would involve first defining a function or predicate (e.g., as larsr suggested, a boolean function
regexp_match : RegularLanguage A -> list A -> bool
) to determine when a regular language contains some string. You can then define an equivalence relation on languages:and use setoid rewrite to rewrite with this equivalence relation. There is a small caveat, however, which is that you can only rewrite with an equivalence relation in contexts that are compatible with this equivalence relation, and you need to explicitly prove lemmas to do so. You can find more details in the reference manual.
Two different elements in the set
RegularLanguage A
are not equal if they are constructed from different constructors. ConsiderL1 = ( a | b )
andL2 = ( b | a)
. HereL1 <> L2
because the arguments to the constructors are different.In your notation, L1 might be
Union nat (Unit nat 1) (Unit nat 2))
, and L2 isUnion nat (Unit nat 2) (Unit nat 1))
.Instead you want to say there is a function
regexp_match : A -> RegularLanguage A -> list A -> bool
that match strings in the given language. You will have to implement this function.Two regexps are equal if they match and reject the same strings. For example, for
L1
andL2
:If you can prove this, then you can use this to replace
regexp_match L1 S
withregexp_match L2 S
and vice versa where you want.