How do I make a do block return early?

2019-03-15 10:46发布

问题:

I'm trying to scrape for a webpage using Haskell and compile the results into an object.

If, for whatever reason, I can't get all the items from the pages, I want to stop trying to process the page and return early.

For example:

scrapePage :: String -> IO ()
scrapePage url = do
  doc <- fromUrl url
  title <- liftM headMay $ runX $ doc >>> css "head.title" >>> getText
  when (isNothing title) (return ())
  date <- liftM headMay $ runX $ doc >>> css "span.dateTime" ! "data-utc"
  when (isNothing date) (return ())
  -- etc
  -- make page object and send it to db
  return ()

The problem is the when doesn't stop the do block or keep the other parts from being executed.

What is the right way to do this?

回答1:

return in haskell does not do the same thing as return in other languages. Instead, what return does is to inject a value into a monad (in this case IO). You have a couple of options

the most simple is to use if

scrapePage :: String -> IO ()
scrapePage url = do
  doc <- fromUrl url
  title <- liftM headMay $ runX $ doc >>> css "head.title" >>> getText
  if (isNothing title) then return () else do
   date <- liftM headMay $ runX $ doc >>> css "span.dateTime" ! "data-utc"
   if (isNothing date) then return () else do
     -- etc
     -- make page object and send it to db
     return ()

another option is to use unless

scrapePage url = do
  doc <- fromUrl url
  title <- liftM headMay $ runX $ doc >>> css "head.title" >>> getText
  unless (isNothing title) do
    date <- liftM headMay $ runX $ doc >>> css "span.dateTime" ! "data-utc"
    unless (isNothing date) do
      -- etc
      -- make page object and send it to db
      return ()

the general problem here is that the IO monad doesn't have control effects (except for exceptions). On the other hand, you could use the maybe monad transformer

scrapePage url = liftM (maybe () id) . runMaybeT $ do
  doc <- liftIO $ fromUrl url
  title <- liftIO $ liftM headMay $ runX $ doc >>> css "head.title" >>> getText
  guard (isJust title)
  date <- liftIO $ liftM headMay $ runX $ doc >>> css "span.dateTime" ! "data-utc"
  guard (isJust date)
  -- etc
  -- make page object and send it to db
  return ()

if you really want to get full blown control effects you need to use ContT

scrapePage :: String -> IO ()
scrapePage url = runContT return $ do
  doc <- fromUrl url
  title <- liftM headMay $ runX $ doc >>> css "head.title" >>> getText
  when (isNothing title) $ callCC ($ ())
  date <- liftM headMay $ runX $ doc >>> css "span.dateTime" ! "data-utc"
  when (isNothing date) $ callCC ($ ())
  -- etc
  -- make page object and send it to db
  return ()

WARNING: none of the above code has been tested, or even type checked!



回答2:

Use a monad transformer!

import Control.Monad.Trans.Class -- from transformers package
import Control.Error.Util        -- from errors package

scrapePage :: String -> IO ()
scrapePage url = maybeT (return ()) return $ do
  doc <- lift $ fromUrl url
  title <- liftM headMay $ lift . runX $ doc >>> css "head.title" >>> getText
  guard . not $ isNothing title
  date <- liftM headMay $ lift . runX $ doc >>> css "span.dateTime" ! "data-utc"
  guard . not $ isNothing date
  -- etc
  -- make page object and send it to db
  return ()

For more flexibility in the return value when you early return, use throwError/eitherT/EitherT instead of mzero/maybeT/MaybeT. (Although then you can't use guard.)

(Probably also use headZ instead of headMay and ditch the explicit guard.)



回答3:

I have never worked with Haskell, but it seems quitte easy. Try when (isNothing date) $ exit (). If this also isn't working, then make sure your statement is correct. Also see this website for more info: Breaking From loop.