I'm trying to understand the differences between conduit and pipes. Unlike pipes, conduit has the concept of leftovers. What are leftovers useful for? I'd like to see some examples where leftovers are essential.
And since pipes don't have the concept of leftovers, is there any way to achieve a similar behavior with them?
Gabriel's point that leftovers are always part of parsing is interesting. I'm not sure I would agree, but that may just depend on the definition of parsing.
There are a large category of use cases which require leftovers. Parsing is certainly one: any time a parse requires some kind of lookahead, you'll need leftovers. One example of this is in the markdown package's getIndented function, which isolates all of the upcoming lines with a certain indentation level, leaving the rest of the lines to be processed later.
But a much more mundane set of examples lives in conduit itself. Any time you're dealing with packed data (like ByteString or Text), you'll need to read a chunk, analyze it somehow, use leftover to push back the extra, and then do something with the original content. Perhaps the simplest example of this is dropWhile.
In fact, I consider leftover to be such a core, basic feature of a streaming library that the new 1.0 interface for conduit doesn't even expose the option to users of disabling leftovers. I know of very few real-world use cases that don't need it in one way or another.
I'll answer for pipes
. The short answer to your question is that the upcoming pipes-parse
library will have support for leftovers as part of a more general parsing framework. I find that almost every case where people want leftovers they actually want a parser, which is why I frame the leftovers problem as a subset of parsing. You can find the current draft of the library here.
However, if you want to understand how pipes-parse
gets it to work, the simplest possible way to implement leftovers is to just use StateP
to store the pushback buffer. This requires only defining the following two functions:
import Control.Proxy
import Control.Proxy.Trans.State
draw :: (Monad m, Proxy p) => StateP [a] p () a b' b m a
draw = do
s <- get
case s of
[] -> request ()
a:as -> do
put as
return a
unDraw :: (Monad m, Proxy p) => a -> StateP [a] p () a b' b m ()
unDraw a = do
as <- get
put (a:as)
draw
first consults the pushback buffer to see if there are any stored elements, popping one element off the stack if available. If the buffer is empty, it instead requests a new element from upstream. Of course, there's no point having a buffer if we can't push anything back, so we also define unDraw
to push an element onto the stack to save for later.
Edit: Oops, I forgot to include a useful example of when leftovers are useful. Like Michael says, takeWhile
and dropWhile
are useful cases of leftovers. Here's the drawWhile
function (analogous to what Michael calls takeWhile
):
drawWhile :: (Monad m, Proxy p) => (a -> Bool) -> StateP [a] p () a b' b m [a]
drawWhile pred = go
where
go = do
a <- draw
if pred a
then do
as <- go
return (a:as)
else do
unDraw a
return []
Now imagine that your producer was:
producer () = do
respond 1
respond 3
respond 4
respond 6
... and you hooked that up to a consumer that used:
consumer () = do
evens <- drawWhile odd
odds <- drawWhile even
If the first drawWhile odd
didn't push back the final element it drew, then you would drop the 4
, which wouldn't get correctly passed onto to the second drawWhile even
statement`.