What's the benefit of conduit's leftovers?

2019-03-15 03:26发布

问题:

I'm trying to understand the differences between conduit and pipes. Unlike pipes, conduit has the concept of leftovers. What are leftovers useful for? I'd like to see some examples where leftovers are essential.

And since pipes don't have the concept of leftovers, is there any way to achieve a similar behavior with them?

回答1:

Gabriel's point that leftovers are always part of parsing is interesting. I'm not sure I would agree, but that may just depend on the definition of parsing.

There are a large category of use cases which require leftovers. Parsing is certainly one: any time a parse requires some kind of lookahead, you'll need leftovers. One example of this is in the markdown package's getIndented function, which isolates all of the upcoming lines with a certain indentation level, leaving the rest of the lines to be processed later.

But a much more mundane set of examples lives in conduit itself. Any time you're dealing with packed data (like ByteString or Text), you'll need to read a chunk, analyze it somehow, use leftover to push back the extra, and then do something with the original content. Perhaps the simplest example of this is dropWhile.

In fact, I consider leftover to be such a core, basic feature of a streaming library that the new 1.0 interface for conduit doesn't even expose the option to users of disabling leftovers. I know of very few real-world use cases that don't need it in one way or another.



回答2:

I'll answer for pipes. The short answer to your question is that the upcoming pipes-parse library will have support for leftovers as part of a more general parsing framework. I find that almost every case where people want leftovers they actually want a parser, which is why I frame the leftovers problem as a subset of parsing. You can find the current draft of the library here.

However, if you want to understand how pipes-parse gets it to work, the simplest possible way to implement leftovers is to just use StateP to store the pushback buffer. This requires only defining the following two functions:

import Control.Proxy
import Control.Proxy.Trans.State

draw :: (Monad m, Proxy p) => StateP [a] p () a b' b m a
draw = do
    s <- get
    case s of
        []   -> request ()
        a:as -> do
            put as
            return a

unDraw :: (Monad m, Proxy p) => a -> StateP [a] p () a b' b m ()
unDraw a = do
    as <- get
    put (a:as)

draw first consults the pushback buffer to see if there are any stored elements, popping one element off the stack if available. If the buffer is empty, it instead requests a new element from upstream. Of course, there's no point having a buffer if we can't push anything back, so we also define unDraw to push an element onto the stack to save for later.

Edit: Oops, I forgot to include a useful example of when leftovers are useful. Like Michael says, takeWhile and dropWhile are useful cases of leftovers. Here's the drawWhile function (analogous to what Michael calls takeWhile):

drawWhile :: (Monad m, Proxy p) => (a -> Bool) -> StateP [a] p () a b' b m [a]
drawWhile pred = go
  where
    go = do
        a <- draw
        if pred a
        then do
            as <- go
            return (a:as)
        else do
            unDraw a
            return []

Now imagine that your producer was:

producer () = do
    respond 1
    respond 3
    respond 4
    respond 6

... and you hooked that up to a consumer that used:

consumer () = do
    evens <- drawWhile odd
    odds  <- drawWhile even

If the first drawWhile odd didn't push back the final element it drew, then you would drop the 4, which wouldn't get correctly passed onto to the second drawWhile even statement`.