Curious about the HashTable performance issues

2019-01-21 06:30发布

问题:

I read that hash tables in Haskell had performance issues (on the Haskell-Cafe in 2006 and Flying Frog Consultancy's blog in 2009), and since I like Haskell it worried me.

That was a year ago, what is the status now (June 2010)? Has the "hash table problem" been fixed in GHC?

回答1:

The problem was that the garbage collector is required to traverse mutable arrays of pointers ("boxed arrays") looking for pointers to data that might be ready to deallocate. Boxed, mutable arrays are the main mechanism for implementing a hashtable, so that particular structure showed up the GC traversal issue. This is common to many languages. The symptom is excessive garbage collection (up to 95% of time spent in GC).

The fix was to implement "card marking" in the GC for mutable arrays of pointers, which occured in late 2009. You shouldn't see excessive GC when using mutable arrays of pointers in Haskell now. On the simple benchmarks, hashtable insertion for large hashes improved by 10x.

Note that the GC walking issue doesn't affect purely functional structures, nor unboxed arrays (like most data parallel arrays, or vector-like arrays, in Haskell. Nor does it affect hashtables stored on the C heap (like judy). Meaning that it didn't affect day-to-day Haskellers not using imperative hash tables.

If you are using hashtables in Haskell, you shouldn't observe any issue now. Here, for example, is a simple hashtable program that inserts 10 million ints into a hash. I'll do the benchmarking, since the original citation doesn't present any code or benchmarks.

import Control.Monad
import qualified Data.HashTable as H
import System.Environment

main = do
  [size] <- fmap (fmap read) getArgs
  m <- H.new (==) H.hashInt
  forM_ [1..size] $ \n -> H.insert m n n
  v <- H.lookup m 100
  print v

With GHC 6.10.2, before the fix, inserting 10M ints:

$ time ./A 10000000 +RTS -s
...
47s.

With GHC 6.13, after the fix:

./A 10000000 +RTS -s 
...
8s

Increasing the default heap area:

./A +RTS -s -A2G
...
2.3s

Avoiding hashtables and using an IntMap:

import Control.Monad
import Data.List
import qualified Data.IntMap as I
import System.Environment

main = do
  [size] <- fmap (fmap read) getArgs
  let k = foldl' (\m n -> I.insert n n m) I.empty [1..size]
  print $ I.lookup 100 k

And we get:

$ time ./A 10000000 +RTS -s        
./A 10000000 +RTS -s
6s

Or, alternatively, using a judy array (which is a Haskell wrapper calling C code through the foreign-function interface):

import Control.Monad
import Data.List
import System.Environment
import qualified Data.Judy as J

main = do
  [size] <- fmap (fmap read) getArgs
  j <- J.new :: IO (J.JudyL Int)
  forM_ [1..size] $ \n -> J.insert (fromIntegral n) n j
  print =<< J.lookup 100 j

Running this,

$ time ./A 10000000 +RTS -s
...
2.1s

So, as you can see, the GC issue with hashtables is fixed, and there have always been other libraries and data structures which were perfectly suitable. In summary, this is a non-issue.

Note: as of 2013, you should probably just use the hashtables package, which supports a range of mutable hashtables natively.



回答2:

A question like this can really be settled only by experiment. But if you don't have the time or money to do experiments, you have to ask other people what they think. When you do so, you might want to consider the source and consider whether the information given has been reviewed or vetted in any way.

Jon Harrop has advanced some interesting claims about Haskell. Let me suggest that you search on Google Groups and elsewhere for evidence of Harrop's expertise in Haskell, Lisp, and other functional languages. You could also read the work by Chris Okasaki and Andy Gill on Patricia trees in Haskell, see how their expertise is regarded. You can also find whose claims, if any, have been checked by a third party. Then you can make up your own mind how seriously to take different people's claims about the performance of different functional languages.

Oh, and don't feed the troll.


P.S. It would be quite reasonable for you to do your own experiments, but perhaps not necessary, since the trusty Don Stewart presents some nice microbenchmarks in his fine answer. Here's an addendum to Don's answer:


Addendum: Using Don Stewart's code on an AMD Phenom 9850 Black Edition clocked at 2.5GHz with 4GB RAM, in 32-bit mode, with ghc -O,

  • With the default heap, the IntMap is 40% faster than the hash table.
  • With the 2G heap, the hash table is 40% faster than the IntMap.
  • If I go to ten million elements with the default heap, the IntMap is four times faster than the hash table (CPU time) or twice as fast by wall-clock time.

I'm a little surprised by this result, but reassured that functional data structures perform pretty well. And confirmed in my belief that it really pays to benchmark your code under the actual conditions in which it's going to be used.



回答3:

In short, even with the fix in the latest GHC, Haskell is still incapable of providing a dictionary (mutable or immutable) that is competitively efficient.

Haskell's hash tables were 32× slower than alternatives like C++ and .NET with GHC 6.10. That was partly due to a performance bug in the GHC garbage collector that was fixed for GHC 6.12.2. But Simon Marlow's results there show only a 5× performance improvement which still leaves Haskell's hash tables many times slower than most alternatives.

Purely functional alternatives are also much slower than a decent hash table. For example, Haskell's IntMap is 10× slower than .NET's hash table.

Using F# 2010 and the latest Haskell Platform 2010.2.0.0 (released yesterday!) with GHC 6.12.3 on this 2.0GHz E5405 Xeon running 32-bit Windows Vista to insert 20M int->int bindings into an empty hash table we find that Haskell is still 29× slower than F# in real time and over 200× slower in terms of CPU time because the Haskell burns all cores:

GHC 6.12.3 Data.HashTable: 42.8s (new!)
.NET hash table:            1.47s

Provided you run only short-lived microbenchmarks you can disable the GHC garbage collector as Don Stewart suggests above. By asking for a nursery generation so large that this particular program will never fill it, he brought the time for the Haskell hash table down to only 1.5s here. However, this completely undermines the whole point of having a nursery generation and will massively degrade the performance of other code because freshly allocated values will now always be cold in the cache (which is why the nursery generation is typically the size of the L2 cache, orders of magnitude smaller than this).