I've just been really surprised by how slow printf from F# is. I have a number of C# programs that process large data files and write out a number of CSV files. I originally started by using fprintf writer "%s,%d,%f,%f,%f,%s"
thinking that that would be simple and reasonably efficient.
However after a while I was getting a bit fed up with waiting for the files to process. (I've got 4gb XML files to go through and write out entries from them.).
When I ran my applications through a profiler, I was amazed to see printf as being one of the really slow methods.
I changed the code to not use printf and now performance is so much better. Printf performance was killing my overall application performance.
To give an example, my original code is:
fprintf sectorWriter "\"%s\",%f,%f,%d,%d,\"%s\",\"%s\",\"%s\",%d,%d,%d,%d,\"%s\",%d,%d,%d,%d,%s,%d"
sector.Label sector.Longitude sector.Latitude sector.RNCId sector.CellId
siteName sector.Switch sector.Technology (int sector.Azimuth) sector.PrimaryScramblingCode
(int sector.FrequencyBand) (int sector.Height) sector.PatternName (int sector.Beamwidth)
(int sector.ElectricalTilt) (int sector.MechanicalTilt) (int (sector.ElectricalTilt + sector.MechanicalTilt))
sector.SectorType (int sector.Radius)
And I've changed it to be the following
seq {
yield sector.Label; yield string sector.Longitude; yield string sector.Latitude; yield string sector.RNCId; yield string sector.CellId;
yield siteName; yield sector.Switch; yield sector.Technology; yield string (int sector.Azimuth); yield string sector.PrimaryScramblingCode;
yield string (int sector.FrequencyBand); yield string (int sector.Height); yield sector.PatternName; yield string (int sector.Beamwidth);
yield string (int sector.ElectricalTilt); yield string (int sector.MechanicalTilt);
yield string (int (sector.ElectricalTilt + sector.MechanicalTilt));
yield sector.SectorType; yield string (int sector.Radius)
}
|> writeCSV sectorWriter
Helper functions
let writeDelimited delimiter (writer:TextWriter) (values:seq<string>) =
values
|> Seq.fold (fun (s:string) v -> if s.Length = 0 then v else s + delimiter + v) ""
|> writer.WriteLine
let writeCSV (writer:TextWriter) (values:seq<string>) = writeDelimited "," writer values
I'm writing out files with about 30,000 rows. Nothing special.
Now that F# 3.1 has been preview released, the performance of
printf
is claimed to have increased by 40x. You might want to have a look at this:TextWriter
already buffers its output. I recommend usingWrite
to output each value, one at a time, instead of formatting an entire line and passing it toWriteLine
. On my laptop, writing 100,000 lines takes nearly a minute using your function, while, using the following function, it runs in half a second.I am not sure how much it matters, but...
Inspecting the code for printf:
https://github.com/fsharp/fsharp/blob/master/src/fsharp/FSharp.Core/printf.fs
I see
and I think the word 'reflection' probably answers the question.
printf
is great for writing simple type-safe output, but if you want good perf in an inner loop, you might want to use a lower-level .NET API to write output. I haven't done my own benchmarking to see.EDIT: This answer is only valid for simple format strings, like "%s" or "%d". See comments below.
It is also interesting to note that if you can make a curried function and reuse that, the reflection will only be carried out once. Sample:
print1 takes 48 ms on my machine while print2 takes 1158 ms.