Here's the code:
{-# LANGUAGE FlexibleContexts #-}
import Data.Int
import qualified Data.Vector.Unboxed as U
import qualified Data.Vector.Generic as V
{-# NOINLINE f #-} -- Note the 'NO'
--f :: (Num r, V.Vector v r) => v r -> v r -> v r
--f :: (V.Vector v Int64) => v Int64 -> v Int64 -> v Int64
--f :: (U.Unbox r, Num r) => U.Vector r -> U.Vector r -> U.Vector r
f :: U.Vector Int64 -> U.Vector Int64 -> U.Vector Int64
f = V.zipWith (+) -- or U.zipWith, it doesn't make a difference
main = do
let iters = 100
dim = 221184
y = U.replicate dim 0 :: U.Vector Int64
let ans = iterate ((f y)) y !! iters
putStr $ (show $ U.sum ans)
I compiled with ghc 7.6.2
and -O2
, and it took 1.7 seconds to run.
I tried several different versions of f
:
f x = U.zipWith (+) x
f x = (U.zipWith (+) x) . id
f x y = U.zipWith (+) x y
Version 1 is the same as the original while versions 2 and 3 run in in under 0.09 seconds (and INLINING
f
doesn't change anything).
I also noticed that if I make f
polymorphic (with any of the three signatures above), even with a "fast" definition (i.e. 2 or 3), it slows back down...to exactly 1.7 seconds. This makes me wonder if the original problem is perhaps due to (lack of) type inference, even though I'm explicitly giving the types for the Vector type and element type.
I'm also interested in adding integers modulo q
:
newtype Zq q i = Zq {unZq :: i}
As when adding Int64
s, if I write a function with every type specified,
h :: U.Vector (Zq Q17 Int64) -> U.Vector (Zq Q17 Int64) -> U.Vector (Zq Q17 Int64)
I get an order of magnitude better performance than if I leave any polymorphism
h :: (Modulus q) => U.Vector (Zq q Int64) -> U.Vector (Zq q Int64) -> U.Vector (Zq q Int64)
But I should at least be able to remove the specific phantom type! It should be compiled out, since I'm dealing with a newtype
.
Here are my questions:
- Where is the slowdown coming from?
- What is going on in versions 2 and 3 of
f
that affect performance in any way? It seems like a bug to me that (what amounts to) coding style can affect performance like this. Are there other examples outside of Vector where partially applying a function or other stylistic choices affect performance? - Why does polymorphism slow me down an order of magnitude independent of where the polymorphism is (i.e. in the vector type, in the
Num
type, both, or phantom type)? I know polymorphism makes code slower, but this is ridiculous. Is there a hack around it?
EDIT 1
I filed a issue with the Vector library page. I found a GHC issue relating to this problem.
EDIT2
I rewrote the question after gaining some insight from @kqr's answer. Below is the original for reference.
--------------ORIGINAL QUESTION--------------------
Here's the code:
{-# LANGUAGE FlexibleContexts #-}
import Control.DeepSeq
import Data.Int
import qualified Data.Vector.Unboxed as U
import qualified Data.Vector.Generic as V
{-# NOINLINE f #-} -- Note the 'NO'
--f :: (Num r, V.Vector v r) => v r -> v r -> v r
--f :: (V.Vector v Int64) => v Int64 -> v Int64 -> v Int64
--f :: (U.Unbox r, Num r) => U.Vector r -> U.Vector r -> U.Vector r
f :: U.Vector Int64 -> U.Vector Int64 -> U.Vector Int64
f = V.zipWith (+)
main = do
let iters = 100
dim = 221184
y = U.replicate dim 0 :: U.Vector Int64
let ans = iterate ((f y)) y !! iters
putStr $ (show $ U.sum ans)
I compiled with ghc 7.6.2
and -O2
, and it took 1.7 seconds to run.
I tried several different versions of f
:
f x = U.zipWith (+) x
f x = (U.zipWith (+) x) . U.force
f x = (U.zipWith (+) x) . Control.DeepSeq.force)
f x = (U.zipWith (+) x) . (\z -> z `seq` z)
f x = (U.zipWith (+) x) . id
f x y = U.zipWith (+) x y
Version 1 is the same as the original, version 2 runs in 0.111 seconds, and versions 3-6 run in in under 0.09 seconds (and INLINING
f
doesn't change anything).
So the order-of-magnitude slowdown appears to be due to laziness since force
helped, but I'm not sure where the laziness is coming from. Unboxed types aren't allowed to be lazy, right?
I tried writing a strict version of iterate
, thinking the vector itself must be lazy:
{-# INLINE iterate' #-}
iterate' :: (NFData a) => (a -> a) -> a -> [a]
iterate' f x = x `seq` x : iterate' f (f x)
but with the point-free version of f
, this didn't help at all.
I also noticed something else, which could be just a coincidence and red herring:
If I make f
polymorphic (with any of the three signatures above), even with a "fast" definition, it slows back down...to exactly 1.7 seconds. This makes me wonder if the original problem is perhaps due to (lack of) type inference, even though everything should be inferred nicely.
Here are my questions:
- Where is the slowdown coming from?
- Why does composing with
force
help, but using a strictiterate
doesn't? - Why is
U.force
worse thanDeepSeq.force
? I have no idea whatU.force
is supposed to do, but it sounds a lot likeDeepSeq.force
, and seems to have a similar effect. - Why does polymorphism slow me down an order of magnitude independent of where the polymorphism is (i.e. in the vector type, in the
Num
type, or both)? - Why are versions 5 and 6, neither of which should have any strictness implications at all, just as fast as a strict function?
As @kqr pointed out, the problem doesn't seem to be strictness. So something about the way I write the function is causing the generic zipWith
to be used rather than the Unboxed-specific version. Is this just a fluke between GHC and the Vector library, or is there something more general that can be said here?