Is GHC's auto-derived `Eq` instance really *O(

2020-06-09 08:38发布

问题:

I just noticed while trying to learn to read GHC Core, that the automatically derived Eq instance for enum-style data types such as

data EType = ETypeA | ETypeB | ETypeC | ETypeD
           | ETypeE | ETypeF | ETypeG | ETypeH
           deriving (Eq)

seems to be transformed into a O(N)-like lookup when looking at GHC's core representation:

$fEqEType_$c== =
  \ (a_ahZ :: EType) (b_ai0 :: EType) ->
    case a_ahZ of _ {
      ETypeA ->
        case b_ai0 of _ {
          ETypeA -> True;
          ETypeB -> False;
          ETypeC -> False;
          ETypeD -> False;
          ETypeE -> False;
          ETypeF -> False;
          ETypeG -> False;
          ETypeH -> False
        };
      ETypeB -> case b_ai0 of _ {__DEFAULT -> False; ETypeB -> True};
      ETypeC -> case b_ai0 of _ {__DEFAULT -> False; ETypeC -> True};
      ETypeD -> case b_ai0 of _ {__DEFAULT -> False; ETypeD -> True};
      ETypeE -> case b_ai0 of _ {__DEFAULT -> False; ETypeE -> True};
      ETypeF -> case b_ai0 of _ {__DEFAULT -> False; ETypeF -> True};
      ETypeG -> case b_ai0 of _ {__DEFAULT -> False; ETypeG -> True};
      ETypeH -> case b_ai0 of _ {__DEFAULT -> False; ETypeH -> True}
    }

Am I misinterpreting the GHC core output? Shouldn't algebraic data types provide an integer id for each constructor, which could then be compared directly in O(1)? Also, why does the first case clause for ETypeA not make use of __DEFAULT as the other clauses do?

update:

As per suggestion by Simon Marlow, I addad a 9th constructor ETypeI, and then GHC switched to using dataToOtag#:

$fEqEType_$c/= =
  \ (a_ahS :: EType) (b_ahT :: EType) ->
    case dataToTag# @ EType a_ahS of a#_ahQ {
      __DEFAULT ->
        case dataToTag# @ EType b_ahT of b#_ahR {
          __DEFAULT ->
            case ==# a#_ahQ b#_ahR of _ {
              False -> True; True -> False
            }
        }
    }

For me, this adds the question as to what the trade-offs between GHC core's case and use of dataToTag# are, and why this particular cut-off of 9 constructors for using dataToTag# is implemented in GHC.

回答1:

Equality comparison of EType is O(1) because the case construct is O(1).

There might or might not be an integer tag for constructors. There are several low level representation choices, so the Core generated works for all of them. That said, you can always make an integer tag for constructors, and that's how I usually implement the derived comparison when I write Haskell compilers.

I have no idea why ETypeA gets a different treatment. Looks like bug.



标签: haskell ghc