I wish if someone gives a complete working code that allows to do the following in Haskell:
Read a very large sequence (more than 1 billion elements) of 32-bit int values from a binary file into an appropriate container (e.g. certainly not a list, for performance issues) and doubling each number if it's less than 1000 (decimal) and then write the resulting 32-bit int values to another binary file. I may not want to read the entire contents of the binary file in the memory at once. I want to read one chunk after the previous.
I am confused because I could find very little documentation about this. Data.Binary, ByteString, Word8 and what not, it just adds to the confusion. There is pretty straight-forward solution to such problems in C/C++. Take an array (e.g. of unsigned int) of desired size, and use the read/write library calls and be done with it. In Haskell it didn't seem so easy, at least to me.
I'd appreciate if your solution uses the best possible standard packages that are available with mainstream Haskell (> GHC 7.10) and not some obscure/obsolete ones.
I read from these pages
If you're doing binary I/O, you almost certainly want
ByteString
for the actual input/output part. Have a look at thehGet
andhPut
functions it provides. (Or, if you only need strictly linear access, you can try using lazy I/O, but it's easy to get that wrong.)Of course, a byte string is just an array of bytes; your next problem is interpreting those bytes as character / integers / doubles / whatever else they're supposed to be. There are a couple of packages for that, but
Data.Binary
seems to be the most mainstream one.The documentation for
binary
seems to want to steer you towards using theBinary
class, where you write code to serialise and deserialise whole objects. But you can use the functions inData.Binary.Get
andData.Binary.Put
to deal with individual items. There you will find functions such asgetWord32be
(getWord32
big-endian) and so forth.I don't have time to write a working code example right now, but basically look at the functions I mention above and ignore everything else, and you should get some idea.Now with working code:
This, I believe, does what you asked for. It reads 1000 chunks of
chunk_size
bytes, converts each one into a list ofWord32
(so it only ever haschunk_size / 4
integers in memory at once), does the calculation you specified, and writes the result back out again.Obviously if you did this "for real" you'd want EOF checking and such.
Best way to work with binary I/O in Haskell is by using bytestrings. Lazy bytestrings provide buffered I/O, so you don't even need to care about it.
Code below assumes that chunk size is a multiple of 32-bit (which it is).
Here is a loop to process one line at a time from
stdin
:Now just replace
hGetLine
with something that reads 4 bytes, etc.Here is the I/O section for
Data.ByteString
:https://hackage.haskell.org/package/bytestring-0.10.6.0/docs/Data-ByteString.html#g:29