I'm trying to scrape for a webpage using Haskell and compile the results into an object.
If, for whatever reason, I can't get all the items from the pages, I want to stop trying to process the page and return early.
For example:
scrapePage :: String -> IO ()
scrapePage url = do
doc <- fromUrl url
title <- liftM headMay $ runX $ doc >>> css "head.title" >>> getText
when (isNothing title) (return ())
date <- liftM headMay $ runX $ doc >>> css "span.dateTime" ! "data-utc"
when (isNothing date) (return ())
-- etc
-- make page object and send it to db
return ()
The problem is the when
doesn't stop the do block or keep the other parts from being executed.
What is the right way to do this?
Use a monad transformer!
For more flexibility in the return value when you early return, use
throwError
/eitherT
/EitherT
instead ofmzero
/maybeT
/MaybeT
. (Although then you can't useguard
.)(Probably also use
headZ
instead ofheadMay
and ditch the explicitguard
.)I have never worked with Haskell, but it seems quitte easy. Try
when (isNothing date) $ exit ()
. If this also isn't working, then make sure your statement is correct. Also see this website for more info: Breaking From loop.return
in haskell does not do the same thing asreturn
in other languages. Instead, whatreturn
does is to inject a value into a monad (in this caseIO
). You have a couple of optionsthe most simple is to use if
another option is to use
unless
the general problem here is that the
IO
monad doesn't have control effects (except for exceptions). On the other hand, you could use the maybe monad transformerif you really want to get full blown control effects you need to use
ContT
WARNING: none of the above code has been tested, or even type checked!