流媒体在Haskell目录的递归下降(Streaming recursive descent of

我试图做使用哈斯克尔的目录结构的递归下降。我想只检索子目录和文件需要（懒洋洋）。

我写了下面的代码，但是当我运行它，跟踪显示所有目录的第一个文件之前访问：

module Main where

import Control.Monad ( forM, forM_, liftM )
import Debug.Trace ( trace )
import System.Directory ( doesDirectoryExist, getDirectoryContents )
import System.Environment ( getArgs )
import System.FilePath ( (</>) )

-- From Real World Haskell, p. 214
getRecursiveContents :: FilePath -> IO [FilePath]
getRecursiveContents topPath = do
  names <- getDirectoryContents topPath
  let
    properNames =
      filter (`notElem` [".", ".."]) $
      trace ("Processing " ++ topPath) names
  paths <- forM properNames $ \name -> do
    let path = topPath </> name
    isDirectory <- doesDirectoryExist path
    if isDirectory
      then getRecursiveContents path
      else return [path]
  return (concat paths)

main :: IO ()
main = do
  [path] <- getArgs
  files <- getRecursiveContents path
  forM_ files $ \file -> putStrLn $ "Found file " ++ file

我怎样才能用交错下降的文件处理？是，这个问题files <- getRecursiveContents path行动得到以下之前进行forM_的main ？

Answer 1:

这正是那种认为iteratees /协同程序设计来解决问题。

您可以轻松地做到这一点pipes 。我对你所做的唯一变化getRecursiveContents就是将它变成一个Producer的FilePath S和到respond的文件名没有返回它的。这让下游处理的文件名立即，而不是等待getRecursiveContents完整。

module Main where

import Control.Monad ( forM_, liftM )
import Control.Proxy
import System.Directory ( doesDirectoryExist, getDirectoryContents )
import System.Environment ( getArgs )
import System.FilePath ( (</>) )

getRecursiveContents :: (Proxy p) => FilePath -> () -> Producer p FilePath IO ()
getRecursiveContents topPath () = runIdentityP $ do
  names <- lift $ getDirectoryContents topPath
  let properNames = filter (`notElem` [".", ".."]) names
  forM_ properNames $ \name -> do
    let path = topPath </> name
    isDirectory <- lift $ doesDirectoryExist path
    if isDirectory
      then getRecursiveContents path ()
      else respond path

main :: IO ()
main = do
    [path] <- getArgs
    runProxy $
            getRecursiveContents path
        >-> useD (\file -> putStrLn $ "Found file " ++ file)

这立即打印出每个文件在其穿过树，它不需要懒惰IO 。它也非常容易改变你的文件名做什么，因为所有你需要做的就是切换出useD阶段，你的实际文件处理逻辑。

要了解更多有关pipes ，我强烈建议你阅读Control.Proxy.Tutorial 。

Answer 2:

使用懒惰IO / unsafe...是不是一个很好的路要走。懒惰IO会导致许多问题，包括未闭合的资源和纯代码中执行不纯的操作。（参见懒I / O的问题上哈斯克尔维基）。

一个安全的方式是使用一些iteratee /枚举库。（更换有问题的懒惰IO对发展这些概念的动机。）您getRecursiveContents将成为数据源（AKA枚举）。而数据将通过一些迭代器消耗。（参见枚举和iteratee上哈斯克尔维基）。

有在枚举库教程只是给遍历和过滤目录树的例子，实现简单的查找工具。它实现方法

enumDir :: FilePath -> Enumerator FilePath IO b

这基本上是你需要的东西。我相信你会发现它很有趣。

也有一个很好的文章，解释在iteratees 单子读者，第16 ：Iteratee：教学由约翰W.拉托，在笔者的旧折新的把戏 iteratee库。

今天，许多人都喜欢新的库，如管道。您可能感兴趣的一个比较：什么是普查员对管道与管道的利弊？。

Answer 3:

多亏了由Niklas B.注释，这里是我的解决方案：

module Main where

import Control.Monad ( forM, forM_, liftM )
import Debug.Trace ( trace )
import System.Directory ( doesDirectoryExist, getDirectoryContents )
import System.Environment ( getArgs )
import System.FilePath ( (</>) )
import System.IO.Unsafe ( unsafeInterleaveIO )

-- From Real World Haskell, p. 214
getRecursiveContents :: FilePath -> IO [FilePath]
getRecursiveContents topPath = do
  names <- unsafeInterleaveIO $ getDirectoryContents topPath
  let
    properNames =
      filter (`notElem` [".", ".."]) $
      trace ("Processing " ++ topPath) names
  paths <- forM properNames $ \name -> do
    let path = topPath </> name
    isDirectory <- doesDirectoryExist path
    if isDirectory
      then unsafeInterleaveIO $ getRecursiveContents path
      else return [path]
  return (concat paths)

main :: IO ()
main = do
  [path] <- getArgs
  files <- unsafeInterleaveIO $ getRecursiveContents path
  forM_ files $ \file -> putStrLn $ "Found file " ++ file

有没有更好的办法？

Answer 4:

我最近在看一个非常类似的问题，在这里我试图做使用一个稍微复杂的搜索IO单子，停药后我发现自己感兴趣的文件。在使用图书馆像枚举器，管道等的解决方案似乎是，你可以在这些答案被张贴的时间做了最好的，我刚刚学会IO成为一个实例Alternative在GHC的基础库大约一年前，这带来了一些新的可能性。这是我写的尝试它的代码：

import Control.Applicative (empty)
import Data.Foldable (asum)
import Data.List (isSuffixOf)
import System.Directory (doesDirectoryExist, listDirectory)
import System.FilePath ((</>))

searchFiles :: (FilePath -> IO a) -> FilePath -> IO a
searchFiles f fp = do
    isDir <- doesDirectoryExist fp
    if isDir
        then do
            entries <- listDirectory fp
            asum $ map (searchFiles f . (fp </>)) entries
        else f fp

matchFile :: String -> FilePath -> IO ()
matchFile name fp
    | name `isSuffixOf` fp = putStrLn $ "Found " ++ fp
    | otherwise = empty

该searchFiles功能做了深度优先搜索目录树，停止当它找到你要找的东西，如作为第一个参数传递的函数来确定。该matchFile功能只是为了了解如何构建合适的函数，作为第一个参数使用searchFiles ; 在现实生活中你可能会做一些更复杂。

这里有趣的是，现在你可以使用empty作出IO计算“放弃”不返回的结果，您可以链接计算连同asum （这只是foldr (<|>) empty ）继续尝试计算，直到他们中的一个成功。

我觉得有点令人不安的是一个类型签名IO动作不再反映了一个事实，它可能故意不产生结果，但可以肯定的简化了代码。我以前试图使用类型，如IO (Maybe a) ，但这样做使得它很难撰写的行动。

恕我直言，不再有太多的理由要使用类型像IO (Maybe a) ，但如果你需要使用像他那种类型的代码的接口，可以很容易地在两种类型之间的转换。要转换IO a到IO (Maybe a) ，你可以只使用Control.Applicative.optional和走另一条路，你可以使用这样的事情：

maybeEmpty :: IO (Maybe a) -> IO a
maybeEmpty m = m >>= maybe empty pure

文章来源: Streaming recursive descent of a directory in Haskell