According to this article,
Enumerations don't count as single-constructor types as far as GHC is concerned, so they don't benefit from unpacking when used as strict constructor fields, or strict function arguments. This is a deficiency in GHC, but it can be worked around.
And instead the use of newtypes is recommended. However, I cannot verify this with the following code:
{-# LANGUAGE MagicHash,BangPatterns #-}
{-# OPTIONS_GHC -O2 -funbox-strict-fields -rtsopts -fllvm -optlc --x86-asm-syntax=intel #-}
module Main(main,f,g)
where
import GHC.Base
import Criterion.Main
data D = A | B | C
newtype E = E Int deriving(Eq)
f :: D -> Int#
f z | z `seq` False = 3422#
f z = case z of
A -> 1234#
B -> 5678#
C -> 9012#
g :: E -> Int#
g z | z `seq` False = 7432#
g z = case z of
(E 0) -> 2345#
(E 1) -> 6789#
(E 2) -> 3535#
f' x = I# (f x)
g' x = I# (g x)
main :: IO ()
main = defaultMain [ bench "f" (whnf f' A)
, bench "g" (whnf g' (E 0))
]
Looking at the assembly, the tags for each constructor of the enumeration D is actually unpacked and directly hard-coded in the instruction. Furthermore, the function f
lacks error-handling code, and more than 10% faster than g
. In a more realistic case I have also experienced a slowdown after converting a enumeration to a newtype. Can anyone give me some insight about this? Thanks.
It depends on the use case. For the functions you have, it's expected that the enumeration performs better. Basically, the three constructors of
D
becomeInt
s resp.Int#
s when the strictness analysis allows that, and GHC knows it's statically checked that the argument can only have one of the three values0#, 1#, 2#
, so it needs not insert error handling code forf
. ForE
, the static guarantee of only one of three values being possible isn't given, so it needs to add error handling code forg
, that slows things down significantly. If you change the definition ofg
so that the last case becomesthe difference vanishes completely or almost completely (I get a 1% - 2% better benchmark for
f
still, but I haven't done enough testing to be sure whether that's a real difference or an artifact of benchmarking).But this is not the use case the wiki page is talking about. What it's talking about is unpacking the constructors into other constructors when the type is a component of other data, e.g.
Then, if compiled with
-funbox-strict-fields
, the threeInt#
s can be unpacked into the constructor ofFooE
, so you'd basically get the equivalent ofwhile the fields of
FooD
have the multi-constructor typeD
and cannot be unpacked into the constructorFD
(1), so that would basically give youThat can obviously have significant impact.
I'm not sure about the case of single-constructor function arguments. That has obvious advantages for types with contained data, like tuples, but I don't see how that would apply to plain enumerations, where you just have a
case
and splitting off a worker and a wrapper makes no sense (to me).Anyway, the worker/wrapper transformation isn't so much a single-constructor thing, constructor specialisation can give the same benefit to types with few constructors. (For how many constructors specialisations would be created depends on the value of
-fspec-constr-count
.)(1) That might have changed, but I doubt it. I haven't checked it though, so it's possible the page is out of date.
I would guess that GHC has changed quite a bit since that page was last updated in 2008. Also, you're using the LLVM backend, so that's likely to have some effect on performance as well. GHC can (and will, since you've used
-O2
) strip any error handling code fromf
, because it knows statically thatf
is total. The same cannot be said forg
. I would guess that it's the LLVM backend that then unpacks the constructor tags inf
, because it can easily see that there is nothing else used by the branching condition. I'm not sure of that, though.