How can I ensure amortized O(n) concatenation from

2019-07-03 22:29发布

问题:

I have an application where it is efficient to use Vectors for one part of the code. However, during the computation I need to keep track of some of the elements. I have heard that you can get O(n) amortized concatenation from Data.Vectors (by the usual array growing trick) but I think that I am not doing it right. So lets say we have the following setup:

import Data.Vector((++),Vector)
import Prelude hiding ((++))
import Control.Monad.State.Strict

data App = S (Vector Int)

add :: Vector Int -> State App ()
add v1 = modify (\S v2 -> S (v2 ++ v1))

Does add run in amortized O(n) time in this case? If not, how can I make add do that (do I need to store an (forall s. MVector s Int) in the State?). Is there a more efficient way of implementing add?

回答1:

I don't know the vector library well either, so I have to stick to general principles mostly. Benchmark it, run a sequence of adds similar to what you expect in your application and see what performance you get. If it's 'good enough', great, stick with the simple code. If not, before storing a (forall s. MVector s Int) in the state (which you can't directly, tuples can't hold forall-types, so you'd need to wrap it), try improving the add-behaviour by converting to mutable vectors and perform the concatenation there before freezing to get an immutable vector again, roughly

add v1 = do
    S v2 <- get
    let v3 = runST $ do
                m1 <- unsafeThaw v2
                m2 <- unsafeGrow m1 (length v1)
                -- copy contents of v1 behind contents of v2
                unsafeFreeze m2
    put (S v3)

You may need to insert some strictness there. However, if unsafeGrow needs to copy, that will not guarantee amortized O(n) behaviour.

You can get amortized O(n) behaviour by

  1. storing the number of used slots in the state too
  2. if the new vector fits in the free space at the end, thaw, copy, freeze without growing
  3. if it doesn't fit in the free space, grow by at least a factor of 2, that guarantees that each element is copied at most twice on average