I'm writing a programme where an input file is split into multiple files (Shamir's Secret Sharing Scheme).
Here's the pipeline I'm imagining:
- source: use Conduit.Binary.sourceFile to read from the input
- conduit: Takes a ByteString, produces [ByteString]
- sink: Takes [ByteString] from the conduit, and write each ByteString (in [ByteString]) to their corresponding file. (say if our input [ByteString] is called bsl, then
bsl !! 0
will be written to file 0,bsl !! 1
to file 1 and so on)
I found a question regarding multiple input files here, but in their case the whole pipeline is run once for each input file, whereas for my programme I'm writing to multiple output files within the pipeline.
I'm also looking through the Conduit source code here to see if I can implement a multiSinkFile myself, but I'm slightly confused by the Consumer type of sinkFile, and more so if I try to dig deeper... (I'm still a beginner)
So, the question is, how should I go about implementing a function like multiSinkFile which allows multiple files to be written as part of a sink?
Any tips is appreciated!
Clarification
Let's say we want to do Shamir's Secret sharing on the file containing binary value of "ABCDEF" (into 3 parts).
(So we have our input file srcFile
and our output files outFile0
,outFile1
and outFile2
)
We first read "ABC" from the file, and do the processing which will give us a list of, say, ["133", "426", "765"]
. so "133"
will be written to outFile0
, "426"
to outFile1
and "765"
to outFile2
. And then we read "DEF" from srcFile
, do processing on it, and write the corresponding outputs to each output file.
EDIT:
Thank you for your answers. I took sometime to understand what's going with ZipSinks etc, and I've written a simple test program which takes the source file's input and simply write it to 3 output files. Hopefully this will help others in the future.
{-# LANGUAGE NoImplicitPrelude #-}
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE OverloadedStrings #-}
import ClassyPrelude.Conduit
import Safe (atMay)
import Text.Printf
import Filesystem.Path.CurrentOS (decodeString, encodeString)
import Control.Monad.Trans.Resource (runResourceT, ResourceT(..))
-- get the output file name given the base (file) path and the split number
getFileName :: FilePath -> Int -> FilePath
getFileName basePath splitNumber = decodeString $ encodeString basePath ++ "." ++ printf "%03d" splitNumber
-- Get the sink file, given a filepath generator (that takes an Int) and the split number
idxSinkFile :: MonadResource m
=> (Int -> FilePath)
-> Int
-> Consumer [ByteString] m ()
idxSinkFile mkFP splitNumber =
concatMapC (flip atMay splitNumber) =$= sinkFile (mkFP splitNumber)
sinkMultiFiles :: MonadResource m
=> (Int -> FilePath)
-> [Int]
-> Sink [ByteString] m ()
sinkMultiFiles mkFP splitNumbers = getZipSink $ otraverse_ (ZipSink . idxSinkFile mkFP) splitNumbers
simpleConduit :: Int -> Conduit ByteString (ResourceT IO) [ByteString]
simpleConduit num = mapC (replicate num)
main :: IO ()
main = do
let mkFP = getFileName "test.txt"
splitNumbers = [0..2]
runResourceT $ sourceFile "test.txt" $$ simpleConduit (length splitNumbers) =$ sinkMultiFiles mkFP splitNumbers