I'm looking for a monad-free, constant access query O(1) associative array.
Consider the hypothetical type:
data HT k v = ???
I want to construct an immutable structure once:
fromList :: Foldable t, Hashable k => t (k,v) -> HT k v
I want to subsequently query it repeatedly with constant time access::
lookup :: Hashable k => HT k v -> k -> Maybe v
There appears to be two candidate libraries which fall short:
unordered-containers
hashtables
unordered-containers
unordered-containers
contains both strict and lazy variants of the type HashMap
. Both HashMap
s have O(log n) queries as documented by the lookup
function. This query access time appears to be due to the construction of the HashMap
types, which have an internal tree structure allowing for O(log n) insert
functionality. An understandable design trade off for many use-cases, but since I don't need a mutable HashMap
this tradeoff hampers my use-case.
hashtables
hashtables
contains a HashTable
type-class and three instance types with varying table constructions strategies. This library's type-class defines a constant time O(1) lookup
function definition, but it is eternally embedded in the ST
monad. There is no way to "freeze" the stateful HashTable
implementations and have a lookup
function that is not embedded of a stateful monad. The library's type-class interface is well designed when the entire computation is wrapped in a state monad, but this design is unsuitable for my use-case.
Does there exist some other library which defines types and functions which can construct an immutable constant access query O(1) associative array that is not embedded in a stateful monad?
Does there exist some way to wrap or modify these existing hashing-based libraries to produce an immutable constant access query O(1) associative array that is not embedded in a stateful monad?
The library you want is… unordered-containers
. Or just plain old Data.Map
from containers
, if you’d prefer.
The note in the unordered-containers
documentation explains why you shouldn’t worry about the O(log n) time complexity for lookups:
Many operations have a average-case complexity of O(log n). The implementation uses a large base (i.e. 16) so in practice these operations are constant time.
This is a common practice with certain kinds of functional data structures because it allows good sharing properties while also having good time complexities. log16 still produces very small numbers even for very large n, so you can almost always treat those complexities as “effectively constant time”.
If this is ever a bottleneck for your application, sure, go with something else, but I find that highly unlikely. After all, log16(1,000,000) is a little under 5, so your lookup time is not going to grow very quickly. Processing all that data is going to take up much more time than the overhead of the lookup.
As always: profile first. If you somehow have a problem that absolutely needs the fastest possible hash map in the world, you might need an imperative hash map, but for every case I’ve ever had, the functional ones work just fine.
You should follow Alexis' suggestion and use unordered-containers
. If you really want something that is guaranteed to have Θ(1)
lookup
s, you can define your own frozen version of any of the hash table types from hashtables
using unsafePerformIO
, but this is not very elegant. For example:
module HT
( HT
, fromList
, lookup
) where
import qualified Data.HashTable.IO as H
import Data.Hashable (Hashable)
import Data.Foldable (toList)
import System.IO.Unsafe (unsafePerformIO)
import Prelude hiding (lookup)
newtype HT k v = HT (H.BasicHashTable k v)
fromList :: (Foldable t, Eq k, Hashable k) => t (k, v) -> HT k v
fromList = HT . unsafePerformIO . H.fromList . toList
lookup :: (Eq k, Hashable k) => HT k v -> k -> Maybe v
lookup (HT h) k = unsafePerformIO $ H.lookup h k
Both uses of unsafePerformIO
above should be safe. For that is crucial that the HT
is exported as an abstract type.