I am trying to write a simple cat
program in Haskell. I would like to take multiple filenames as arguments, and write each file sequentially to STDOUT, but my program only prints one file and exits.
What do I need to do to make my code print every file, not just the first one passed in?
import Control.Monad as Monad
import System.Exit
import System.IO as IO
import System.Environment as Env
main :: IO ()
main = do
-- Get the command line arguments
args <- Env.getArgs
-- If we have arguments, read them as files and output them
if (length args > 0) then catFileArray args
-- Otherwise, output stdin to stdout
else catHandle stdin
catFileArray :: [FilePath] -> IO ()
catFileArray files = do
putStrLn $ "==> Number of files: " ++ (show $ length files)
-- run `catFile` for each file passed in
Monad.forM_ files catFile
catFile :: FilePath -> IO ()
catFile f = do
putStrLn ("==> " ++ f)
handle <- openFile f ReadMode
catHandle handle
catHandle :: Handle -> IO ()
catHandle h = Monad.forever $ do
eof <- IO.hIsEOF h
if eof then do
hClose h
exitWith ExitSuccess
else
hGetLine h >>= putStrLn
I am running the code like this:
runghc cat.hs file1 file2
My first idea is this:
It doesn't really fail in unix-y way, and doesn't do stdin nor multibyte stuff, but it is "way more haskell" so I just wanted to share that. Hope it helps.
On the other hand, I guess it should handle large files easily without filling up memory, thanks to the fact that putStr can already empty the string during file reading.
Your problem is that
exitWith
terminates the whole program. So, you cannot really useforever
to loop through the file, because obviously you don't want to run the function "forever", just until the end of the file. You can rewritecatHandle
like thisI.e. if we haven't reached EOF, we recurse and read another line.
However, this whole approach is overly complicated. You can write cat simply as
Because of lazy i/o, the whole file contents are not actually loaded into memory, but streamed into stdout.
If you are comfortable with the operators from
Control.Monad
, the whole program can be shortened down tocatHandle
, which is indirectly called fromcatFileArray
, callsexitWith
when it reaches the end of the first file. This terminates the program, and further files aren't read anymore.You should instead just return normally from the
catHandle
function when the end of the file has been reached. This probably means you shouldn't do the readingforever
.If you install the very helpful
conduit
package, you can do it this way:This looks similar to shang's suggested simple solution, but using conduits and
ByteString
instead of lazy I/O andString
. Both of those are good things to learn to avoid: lazy I/O frees resources at unpredictable times;String
has a lot of memory overhead.Note that
ByteString
is intended to represent binary data, not text. In this case we're just treating the files as uninterpreted sequences of bytes, soByteString
is fine to use. If OTOH we were processing the file as text—counting characters, parsing, etc—we'd want to useData.Text
.EDIT: You can also write it like this:
In the original,
sourceFile filename
creates aSource
that reads from the named file; and we useforM_
on the outside to loop over each argument and run theResourceT
computation over each filename.However in Conduit you can use monadic
>>
to concatenate sources;source1 >> source2
is a source that produces the elements ofsource1
until it's done, then produces those ofsource2
. So in this second example,mapM_ sourceFile files
is equivalent tosourceFile file0 >> ... >> sourceFile filen
—aSource
that concatenates all of the sources.EDIT 2: And following Dan Burton's suggestion in the comment to this answer:
In English,
sourceArgs $= readFileConduit
is a source that produces the contents of the files named by the command line arguments.