Such structures are necessary for real-time applications - for example user interfaces. (Users don't care if clicking a button takes 0.1s or 0.2s, but they do care if the 100th click forces an outstanding lazy computation and takes 10s to proceed.)
I was reading Okasaki's thesis Purely functional data structures and he describes an interesting general method for converting lazy data structures with amortized bounds into structures with the same worst-case bounds for every operation. The idea is to distribute computations so that at each update some portion of unevaluated thunks is forced.
I wonder, is there any such implementation of standard collections (Map
, Set
, etc.) in Haskell?
The containers package says
The declared cost of each operation is either worst-case or amortized, but remains valid even if structures are shared.
so there is no guarantee for the worst-case bounds for a single operation. There are strict variants like Data.Map.Strict
, but they're strict in their keys and values:
Key and value arguments are evaluated to WHNF; Keys and values are evaluated to WHNF before they are stored in the map.
there is nothing about (possible) strictness of its structure.
Go looking for the source, e.g. for
Data.Map.Map
You see that a
Map
is totally spine-strict (and strict in the keys, even withData.Map.Lazy
), if you evaluate it to WHNF, the complete spine is forced. The same holds forIntMap
s,Set
s andIntSet
s.So you can prevent the construction of large thunks (except for the mapped-to/contained values) easily by forcing the container to WHNF before every operation. The prevention of large thunks for the contained values [a common cause for time (and space) leaks] is automatic for the
Data.XYZ.Strict
variants (caveat: the values are only evaluated to WHNF, if you need more, you have to do it yourself by e.g.deepseq
ing any changed values immediatley after the operation), something you need to handle yourself with theData.XYZ.Lazy
variants.Thus
is an easily avoided problem with these containers.
However, it could still be that the 100th click takes much longer to process than the average, not due to outstanding lazy computations, but due to the algorithm (consider the classic queue implementation with two lists, the front, where you dequeue elements by
dequeue (Q (x:xs) ys) = (x, Q xs ys)
in O(1), and the back where youenqueue y (Q xs ys) = Q xs (y:ys)
in O(1), well, except that dequeuing takes O(size) when the front list is empty and the back needs to be reversed first, but it's O(1) amortized still) without changing the amortized cost.I don't know if the algorithms used in containers have any such cases, but it's something to be aware of.