I want to read the last line of my file and make sure it has the same number of fields as my first---I don't care about anything in the middle. I'm using mmap because it's fast for random access on large files, but am encountering problems not understanding Haskell or laziness.
λ> import qualified Data.ByteString.Lazy.Char8 as LB
λ> import System.IO.MMap
λ> outh <- mmapFileByteStringLazy fname Nothing
λ> LB.length outh
87094896
λ> LB.takeWhile (`notElem` "\n") outh
"\"Field1\",\"Field2\",
Great.
From here, I know that
takeWhileR p xs is equivalent to reverse (takeWhileL p (reverse xs)).
So let's make this. That is, let's get the last line by reversing my lazy bytestring, taking while not "\n" just as before, then unreversing it. Laziness makes me think the compiler will let me do this easily.
So trying this out:
LB.reverse (LB.takeWhile (`notElem` "\n") (LB.reverse outh))
What I expect to see is:
"\"val1\",\"val2\",
Instead, this crashes my session.
Segmentation fault (core dumped)
Questions:
- What am I doing wrong with laziness, or bytestrings, or the mmap library, or Haskell?
- How can I get this line correctly and with memory efficiency? (The answer possibly using foreign pointers instead of lazy bytestrings?)
For other readers, if you're looking to get the last line, you may find a very fast and suitable method described in the answer here: hSeek and SeekFromEnd in Haskell
In this thread, I'm looking specifically for a solution using mmap.
I would prefer the use of
bytestring-mmap
made by the same author asbytestring
. In either case, all you need isThis runs instantly too, with no extra allocations. As before, there is the caveat that many files end in newlines, so one may want to have
BS.breakEnd (== '\n') (init bs)
to ignore the last\n
character.Also, I would not recommend reversing the bytestring - that will require at least some allocations, which are in this case completely avoidable. Even if you use a lazy bytestring, you still pay the cost of going through all the chunks of the bytestring (which hopefully shouldn't even have been constructed at this point). That said, your reversing code should work. I reckon something is off with
mmap
(probably the package as the doing the same thing with a strict bytestring works just fine).Previous answer, from before OP's edit
I'm not sure what the problem is with the functions in
System.IO
. The following runs instantly on my laptop, filefile.txt
being almost 4GB. It isn't elegant, but it is certainly efficient.