I just noticed while trying to learn to read GHC Core, that the
automatically derived Eq
instance for enum-style data types such as
data EType = ETypeA | ETypeB | ETypeC | ETypeD
| ETypeE | ETypeF | ETypeG | ETypeH
deriving (Eq)
seems to be transformed into a O(N)-like lookup when looking at GHC's core representation:
$fEqEType_$c== =
\ (a_ahZ :: EType) (b_ai0 :: EType) ->
case a_ahZ of _ {
ETypeA ->
case b_ai0 of _ {
ETypeA -> True;
ETypeB -> False;
ETypeC -> False;
ETypeD -> False;
ETypeE -> False;
ETypeF -> False;
ETypeG -> False;
ETypeH -> False
};
ETypeB -> case b_ai0 of _ {__DEFAULT -> False; ETypeB -> True};
ETypeC -> case b_ai0 of _ {__DEFAULT -> False; ETypeC -> True};
ETypeD -> case b_ai0 of _ {__DEFAULT -> False; ETypeD -> True};
ETypeE -> case b_ai0 of _ {__DEFAULT -> False; ETypeE -> True};
ETypeF -> case b_ai0 of _ {__DEFAULT -> False; ETypeF -> True};
ETypeG -> case b_ai0 of _ {__DEFAULT -> False; ETypeG -> True};
ETypeH -> case b_ai0 of _ {__DEFAULT -> False; ETypeH -> True}
}
Am I misinterpreting the GHC core output? Shouldn't algebraic data types provide an integer id for each constructor, which could then be compared directly in O(1)? Also, why does the first case clause for ETypeA
not make use of __DEFAULT
as the other clauses do?
update:
As per suggestion by Simon Marlow, I addad a 9th constructor ETypeI
, and then GHC switched to using dataToOtag#
:
$fEqEType_$c/= =
\ (a_ahS :: EType) (b_ahT :: EType) ->
case dataToTag# @ EType a_ahS of a#_ahQ {
__DEFAULT ->
case dataToTag# @ EType b_ahT of b#_ahR {
__DEFAULT ->
case ==# a#_ahQ b#_ahR of _ {
False -> True; True -> False
}
}
}
For me, this adds the question as to what the trade-offs between GHC core's case
and use of dataToTag#
are, and why this particular cut-off of 9 constructors for using dataToTag#
is implemented in GHC.