I've ran into a problem while writing a parser. Specifically, I want to be return values of different types. For example, I have two different data types FA
and PA
to represent two different lipid classes -
data FA = ClassLevelFA IntegerMass
| FA CarbonChain
deriving (Show, Eq, Ord)
data PA = ClassLevelPA IntegerMass
| CombinedRadylsPA TwoCombinedRadyls
| UnknownSnPA Radyl Radyl
| KnownSnPA Radyl Radyl
deriving (Show, Eq, Ord)
Using attoparsec, I have built parsers to parse lipid shorthand notation. For the data types above, I have the parsers faParser
and paParser
. I would like to be able to parse for either an FA
or PA
. However, since FA
and PA
are different data types, I cannot do the following -
inputParser = faParser
<|> paParser
I have recently learnt about GADTs and I thought this would solve my problem. Consequently I went and made a GADT and an eval
function and changed the parsers faParser
and paParser
. -
data ParsedLipid a where
ParsedFA :: FA -> ParsedLipid FA
ParsedPA :: PA -> ParsedLipid PA
eval :: ParsedLipid a -> a
eval (ParsedFA val) = val
eval (ParsedPA val) = val
This gets me close but it appears as if ParsedFA
and ParsedPA
are different data types? For example, parsing "PA 17:1_18:1"
gives me a value of type ParsedLipid PA
(not just ParsedLipid
as I was expecting). Therefore, the parser inputParser
still does not type check.
let lipid = use "PA 17:1_18:1"
*Main> :t lipid
lipid :: ParsedLipid PA
Any advice on how to get around this problem?
@MathematicalOrchid points out that you probably don't need GADTs and a plain sum type might be enough. You may have an XY problem but I don't know enough about your use case to make a firm judgment. This answer is about how to turn untyped data into a GADT.
As you note, your parsing function can't return a ParsedLipid a
because that leaves the calling context free to choose a
which doesn't make sense; a
is in fact determined by the input data. And you can't return a ParsedLipid FA
or a ParsedLipid PA
, because the input data may be of either type.
Therefore, the standard trick when building a GADT from runtime data - when you don't know the type of the index in advance - is to use existential quantification.
data AParsedLipid = forall a. AParsedLipid (ParsedLipid a)
The type parameter a
appears on the right-hand side of AParsedLipid
but not on the left. A value of AParsedLipid
is guaranteed to contain a well-formed ParsedLipid
, but its precise type is kept secret. An existential type is a wrapper which helps us to translate from the messy, untyped real world to a clean, strongly-typed GADT.
It's a good idea to keep the existentially quantified wrappers pushed to the edges of your system, where you have to communicate with the outside world. You can write functions with signatures like ParsedLipid a -> a
in your core model and apply them to data under an existential wrapper at the edges. You validate your input, wrap it in an existential type, and then process it safely using your strongly-typed model - which doesn't have to worry about errors, because you already checked your input.
You can unpack an AParsedLipid
to get your ParsedLipid
back, and pattern-match on that to determine at runtime what a
was - it'll either be FA
or PA
.
showFA :: FA -> String
showFA = ...
showPA :: PA -> String
showPA = ...
showLipid :: AParsedLipid -> String
showLipid (AParsedLipid (ParsedFA x)) = "AParsedLipid (ParsedFA "++ showFA x ++")"
showLipid (AParsedLipid (ParsedPA x)) = "AParsedLipid (ParsedPA "++ showPA x ++")"
You'll note that a
can't appear in the return type of a function taking an AParsedLipid
either, for the reasons outlined above. The return type must be known to the compiler; this technique does not allow you to define a "function with different return types".
When you're constructing an AParsedLipid
, you have to generate enough evidence to convince the compiler that the wrapped ParsedLipid
is well-formed. In your example, this comes down to parsing a well-typed PA
or FA
and then wrapping it up.
parser :: Parser AParsedLipid
parser = AParsedLipid <$> (fmap ParsedFA faParser <|> fmap ParsedPA paParser)
GADTs are a bit awkward when used with runtime data. The existential wrapper effectively erases the extra compile-time information in a ParsedLipid
- AParsedLipid
is isomorphic to Either FA PA
. (Proving that isomorphism in code is a good exercise.) In my experience GADTs are much better at structuring programs than at structuring data - they excel at implementing strongly-typed embedded domain-specific languages for which the types' indices can be known at compile time. For example, Yampa and extensible-effects both use a GADT as their central data type. This helps the compiler to check that you're using the domain-specific language correctly in the code you write (and in some cases allows certain optimisations). It's pretty unlikely that you'll be building FRP networks or monadic effects at runtime from real-world data.
What exactly do you want?
If you know at compile time whether you want an FA
or a PA
, then GADTs are a good way to do that.
If you want to decide at run-time to parse either an FA
or a PA
, you could use... Either FA PA
.