I've ran into a problem while writing a parser. Specifically, I want to be return values of different types. For example, I have two different data types FA
and PA
to represent two different lipid classes -
data FA = ClassLevelFA IntegerMass
| FA CarbonChain
deriving (Show, Eq, Ord)
data PA = ClassLevelPA IntegerMass
| CombinedRadylsPA TwoCombinedRadyls
| UnknownSnPA Radyl Radyl
| KnownSnPA Radyl Radyl
deriving (Show, Eq, Ord)
Using attoparsec, I have built parsers to parse lipid shorthand notation. For the data types above, I have the parsers faParser
and paParser
. I would like to be able to parse for either an FA
or PA
. However, since FA
and PA
are different data types, I cannot do the following -
inputParser = faParser
<|> paParser
I have recently learnt about GADTs and I thought this would solve my problem. Consequently I went and made a GADT and an eval
function and changed the parsers faParser
and paParser
. -
data ParsedLipid a where
ParsedFA :: FA -> ParsedLipid FA
ParsedPA :: PA -> ParsedLipid PA
eval :: ParsedLipid a -> a
eval (ParsedFA val) = val
eval (ParsedPA val) = val
This gets me close but it appears as if ParsedFA
and ParsedPA
are different data types? For example, parsing "PA 17:1_18:1"
gives me a value of type ParsedLipid PA
(not just ParsedLipid
as I was expecting). Therefore, the parser inputParser
still does not type check.
let lipid = use "PA 17:1_18:1"
*Main> :t lipid
lipid :: ParsedLipid PA
Any advice on how to get around this problem?
What exactly do you want?
If you know at compile time whether you want an
FA
or aPA
, then GADTs are a good way to do that.If you want to decide at run-time to parse either an
FA
or aPA
, you could use...Either FA PA
.@MathematicalOrchid points out that you probably don't need GADTs and a plain sum type might be enough. You may have an XY problem but I don't know enough about your use case to make a firm judgment. This answer is about how to turn untyped data into a GADT.
As you note, your parsing function can't return a
ParsedLipid a
because that leaves the calling context free to choosea
which doesn't make sense;a
is in fact determined by the input data. And you can't return aParsedLipid FA
or aParsedLipid PA
, because the input data may be of either type.Therefore, the standard trick when building a GADT from runtime data - when you don't know the type of the index in advance - is to use existential quantification.
The type parameter
a
appears on the right-hand side ofAParsedLipid
but not on the left. A value ofAParsedLipid
is guaranteed to contain a well-formedParsedLipid
, but its precise type is kept secret. An existential type is a wrapper which helps us to translate from the messy, untyped real world to a clean, strongly-typed GADT.It's a good idea to keep the existentially quantified wrappers pushed to the edges of your system, where you have to communicate with the outside world. You can write functions with signatures like
ParsedLipid a -> a
in your core model and apply them to data under an existential wrapper at the edges. You validate your input, wrap it in an existential type, and then process it safely using your strongly-typed model - which doesn't have to worry about errors, because you already checked your input.You can unpack an
AParsedLipid
to get yourParsedLipid
back, and pattern-match on that to determine at runtime whata
was - it'll either beFA
orPA
.You'll note that
a
can't appear in the return type of a function taking anAParsedLipid
either, for the reasons outlined above. The return type must be known to the compiler; this technique does not allow you to define a "function with different return types".When you're constructing an
AParsedLipid
, you have to generate enough evidence to convince the compiler that the wrappedParsedLipid
is well-formed. In your example, this comes down to parsing a well-typedPA
orFA
and then wrapping it up.GADTs are a bit awkward when used with runtime data. The existential wrapper effectively erases the extra compile-time information in a
ParsedLipid
-AParsedLipid
is isomorphic toEither FA PA
. (Proving that isomorphism in code is a good exercise.) In my experience GADTs are much better at structuring programs than at structuring data - they excel at implementing strongly-typed embedded domain-specific languages for which the types' indices can be known at compile time. For example, Yampa and extensible-effects both use a GADT as their central data type. This helps the compiler to check that you're using the domain-specific language correctly in the code you write (and in some cases allows certain optimisations). It's pretty unlikely that you'll be building FRP networks or monadic effects at runtime from real-world data.