Haskell collections with guaranteed worst-case bou

2019-04-04 22:53发布

Such structures are necessary for real-time applications - for example user interfaces. (Users don't care if clicking a button takes 0.1s or 0.2s, but they do care if the 100th click forces an outstanding lazy computation and takes 10s to proceed.)

I was reading Okasaki's thesis Purely functional data structures and he describes an interesting general method for converting lazy data structures with amortized bounds into structures with the same worst-case bounds for every operation. The idea is to distribute computations so that at each update some portion of unevaluated thunks is forced.

I wonder, is there any such implementation of standard collections (Map, Set, etc.) in Haskell?

The containers package says

The declared cost of each operation is either worst-case or amortized, but remains valid even if structures are shared.

so there is no guarantee for the worst-case bounds for a single operation. There are strict variants like Data.Map.Strict, but they're strict in their keys and values:

Key and value arguments are evaluated to WHNF; Keys and values are evaluated to WHNF before they are stored in the map.

there is nothing about (possible) strictness of its structure.

1条回答
2楼-- · 2019-04-04 23:42

there is nothing about (possible) strictness of its structure.

Go looking for the source, e.g. for Data.Map.Map

-- See Note: Order of constructors
data Map k a  = Bin {-# UNPACK #-} !Size !k a !(Map k a) !(Map k a)
              | Tip

You see that a Map is totally spine-strict (and strict in the keys, even with Data.Map.Lazy), if you evaluate it to WHNF, the complete spine is forced. The same holds for IntMaps, Sets and IntSets.

So you can prevent the construction of large thunks (except for the mapped-to/contained values) easily by forcing the container to WHNF before every operation. The prevention of large thunks for the contained values [a common cause for time (and space) leaks] is automatic for the Data.XYZ.Strict variants (caveat: the values are only evaluated to WHNF, if you need more, you have to do it yourself by e.g. deepseqing any changed values immediatley after the operation), something you need to handle yourself with the Data.XYZ.Lazy variants.

Thus

Users don't care if clicking a button takes 0.1s or 0.2s, but they do care if the 100th click forces an outstanding lazy computation and takes 10s to proceed.

is an easily avoided problem with these containers.

However, it could still be that the 100th click takes much longer to process than the average, not due to outstanding lazy computations, but due to the algorithm (consider the classic queue implementation with two lists, the front, where you dequeue elements by dequeue (Q (x:xs) ys) = (x, Q xs ys) in O(1), and the back where you enqueue y (Q xs ys) = Q xs (y:ys) in O(1), well, except that dequeuing takes O(size) when the front list is empty and the back needs to be reversed first, but it's O(1) amortized still) without changing the amortized cost.

I don't know if the algorithms used in containers have any such cases, but it's something to be aware of.

查看更多
登录 后发表回答