Another microbenchmark: Why is this "loop" (compiled with ghc -O2 -fllvm
, 7.4.1, Linux 64bit 3.2 kernel, redirected to /dev/null
)
mapM_ print [1..100000000]
about 5x slower than a simple for-cycle in plain C
with write(2)
non-buffered syscall? I am trying to gather Haskell gotchas.
Even this slow C solution is much faster than Haskell
int i;
char buf[16];
for (i=0; i<=100000000; i++) {
sprintf(buf, "%d\n", i);
write(1, buf, strlen(buf));
}
Okay, on my box the C code, compiled per
gcc -O3
takes about 21.5 seconds to run, the original Haskell code about 56 seconds. So not a factor of 5, a bit above 2.5.The first nontrivial difference is that
uses
Integer
s, that's a bit slower because it involves a check upfront, and then works with boxedInt
s, while theShow
instance ofInt
does the conversion work on unboxedInt#
s.Adding a type signature, so that the Haskell code works on
Int
s,brings the time down to 47 seconds, a bit above twice the time the C code takes.
Now, another big difference is that
show
produces a linked list ofChar
and doesn't just fill a contiguous buffer of bytes. That is slower too.Then that linked list of
Char
s is used to fill a byte buffer that then is written to thestdout
handle.So, the Haskell code does more, and more complicated things than the C code, thus it's not surprising that it takes longer.
Admittedly, it would be desirable to have an easy way to output such things more directly (and hence faster). However, the proper way to handle it is to use a more suitable algorithm (that applies to C too). A simple change to
almost halves the time taken, and if one wants it really fast, one uses the faster
ByteString
I/O and builds the output efficiently as exemplified in applicative's answer.On my (rather slow and outdated) machine the results are:
(I have fixed the list so that both C and Haskell start with 0).
Yes you read this right. Haskell is several times faster than C. Or rather, normally buffered Haskell is faster than C with write(2) non-buffered syscall.
(When measuring output to /dev/null instead of a real disk file, C is about 1.5 times faster, but who cares about /dev/null performance?)
Technical data: Intel E2140 CPU, 2 cores, 1.6 GHz, 1M cache, Gentoo Linux, gcc4.6.1, ghc7.6.1.
The standard Haskell way to hand giant bytestrings over to the operating system is to use a builder monoid.
which gives me:
in contrast to Haskell approach you used
That is, it's child's play to do nine times better than
mapM_ print
, contra Daniel Fischer's suprising defeatism. Everything you need to know is here: http://hackage.haskell.org/packages/archive/bytestring/0.10.2.0/doc/html/Data-ByteString-Builder.html I won't compare it with your C since my results were much slower than Daniel's and n.m. so I figure something was going wrong.Edit: Made the imports consistent with all versions of
bytestring-0.10.x
It occurred to me the following might be clearer -- theBuilder
equivalent ofunlines . map show
: