I am trying to get amortized O(n) time concatenation of vectors. It seems to be working but if I need to store boxed values (such as vectors) the result is still very slow.
import qualified Data.Vector as V
import qualified Data.Vector.Generic.Mutable as GM
import Data.Vector(Vector)
import Control.Monad.State.Strict
import Control.Monad.ST
data App = S !(Vector Int) !Int deriving (Show)
main = do
x <- liftM (map read . words) getContents
print $ execState (mapM_ (add . V.singleton) x) (S V.empty 0)
add :: Vector Int -> State App ()
add v1 = do
S v2 n <- get
let v3 = vectorGrowAdd v1 v2 n
put (S v3 (n + V.length v1))
vectorGrowAdd v1 v2 n = runST $ do
m1 <- V.unsafeThaw v1
m2 <- V.unsafeThaw v2
m3 <- if GM.length m2 > (GM.length m1 + n)
then do
return m2
else do
GM.unsafeGrow m2 (GM.length m1 + 2*(GM.length m2))
let copyTo = GM.unsafeSlice n (GM.length m1) m3
GM.unsafeCopy copyTo m1
V.freeze m3
In this example, testVals
is a text file with 100000 integers, Boxed.hs
is the code above and Unboxed.hs
is the same as Boxed.hs
except for importing Data.Vector.Unboxed
instaid of Data.Vector
.
> ghc -v
Glasgow Haskell Compiler, Version 7.0.3
> ghc --make -O2 Boxed.hs
> time (cat testVals | ./Boxed.hs)
...
real 1m39.855s
user 1m39.282s
sys 0m0.252s
> ghc --make -O2 Unboxed.hs
> time (cat testVals | ./Unboxed.hs)
...
real 0m4.372s
user 0m4.268s
sys 0m0.088s
My question is: Why is there such a drastic difference between Unboxed and Boxed? Is there something I can do to improved the speed if I need to store values that can't be unboxed?
I'm not sure why it has such a dramatic impact on boxed Vector
s, but you're wasting a lot of time in
V.freeze m3
That creates a copy of m3
each time. So you're copying 100,000 Vector
s of increasing length. You don't need the old ones anymore, so they're garbage-collected. Garbage collection of boxed Vector
s takes much longer than collection of unboxed Vector
s because all pointers have to be followed to see whether the pointees can be collected too. I'm a bit surprised by how much difference it makes, though.
A few stats:
$ cat ./testVals | ./OldBoxed +RTS -t > Bxd.txt
<<ghc: 72590744976 bytes, 79999 GCs, 5696847/15655472 avg/max bytes residency (16 samples),
802M in use, 0.00 INIT (0.00 elapsed), 36.97 MUT (37.01 elapsed), 52.60 GC (52.67 elapsed) :ghc>>
$ cat ./testVals | ./OldUnboxed +RTS -t > UBxd.txt
<<ghc: 72518297568 bytes, 78256 GCs, 1013955/2294848 avg/max bytes residency (63 samples),
81M in use, 0.00 INIT (0.00 elapsed), 9.14 MUT (9.16 elapsed), 0.60 GC (0.60 elapsed) :ghc>>
So you see that the enormous difference is due to GC, althogh MUT (the time your programme does actual work) is far lower for unboxed, too.
Now, if we replace the offending freeze
by unsafeFreeze
, we get
$ cat ./testVals | ./Boxed +RTS -t > Bxd.txt
<<ghc: 1149880088 bytes, 2214 GCs, 5236803/17102432 avg/max bytes residency (11 samples),
39M in use, 0.00 INIT (0.00 elapsed), 0.53 MUT (0.53 elapsed), 0.29 GC (0.29 elapsed) :ghc>>
$ cat ./testVals | ./Unboxed +RTS -t > UBxd.txt
<<ghc: 1152277200 bytes, 2229 GCs, 767477/2267200 avg/max bytes residency (31 samples),
7M in use, 0.00 INIT (0.00 elapsed), 0.61 MUT (0.62 elapsed), 0.04 GC (0.04 elapsed) :ghc>>
which exposes a far smaller difference. In fact, here the boxed Vector
needed less mutator time than unboxed. The GC time is still much higher, though, so overall unboxed still is faster, but at 0.66s vs 0.82s, it's nothing dramatic.