HXT: Can an input change with the arrow syntax?

2019-07-28 11:17发布

问题:

With the following code

{-# LANGUAGE Arrows #-}
{-# LANGUAGE NoMonomorphismRestriction #-}
import Text.XML.HXT.Core

parseXml :: IOSArrow XmlTree XmlTree
parseXml = getChildren >>> getChildren >>>
  proc x -> do
    y <- x >- hasName "item"
    returnA -< x

main :: IO ()
main = do
    person <- runX (readString [withValidate no]
                    "<xml><item>John</item><item2>Smith</item2></xml>"
                    >>> parseXml)
    putStrLn $ show person
    return ()

I get the output

[NTree (XTag "item" []) [NTree (XText "John") []]]

So it seems that hasName "item" was applied to x which I did not expect. Using arrowp I get for parseXml:

parseXml
   = getChildren >>> getChildren >>>
      (arr (\ x -> (x, x)) >>>
         (first (hasName "item") >>> arr (\ (y, x) -> x)))

So I have the arrow diagram

                                                       y
                                   /-- hasName "item" ---
                               x  /                       
-- getChildren -- getChildren ---\x->(x,x)              \(y,x)->x --- final result
                                  \                       / 
                                   \---------------------/  

Why is hasName "item" also applied to second place of the tuple? I thought there is no state in haskell and hasName "item" x returns a new object instead of changing the internal state of x.

Related question: Is factoring an arrow out of arrow do notation a valid transformation?

My original problem

I have the following code:

{-# LANGUAGE Arrows #-}
import Text.XML.HXT.Core

data Person = Person { forname :: String, surname :: String } deriving (Show)

parseXml :: IOSArrow XmlTree Person
parseXml = proc x -> do
    forname <- x >- this /> this /> hasName "fn" /> getText
    surname <- x >- this /> this /> hasName "sn" /> getText
    returnA -< Person forname surname

main :: IO ()
main = do
    person <- runX (readString [withValidate no]
                               "<p><fn>John</fn><sn>Smith</sn></p>"
                    >>> parseXml)
    putStrLn $ show person
    return ()

If I run it everything works fine and I get the output

[Person {forname = "John", surname = "Smith"}]

But if I change the parseXml to avoid the this statements

parseXml :: IOSArrow XmlTree Person
parseXml = (getChildren >>> getChildren) >>> proc x -> do
    forname <- x >- hasName "fn" /> getText
    surname <- x >- hasName "sn" /> getText
    returnA -< Person forname surname

no person can be parsed anymore (output is []). Investigating the problem with

parseXml :: IOSArrow XmlTree Person
parseXml = (getChildren >>> getChildren) >>>
  proc x -> do
    forname <- x >- withTraceLevel 5 traceTree >>> hasName "fn" /> getText
    surname <- x >- hasName "sn" /> getText
    returnA -< Person forname surname

I got the output

content of: 
============

---XTag "fn"
   |
   +---XText "John"



content of: 
============

---XTag "sn"
   |
   +---XText "Smith"


[]

So everything seems fine, but with the code

parseXml :: IOSArrow XmlTree Person
parseXml = (getChildren >>> getChildren) >>>
  proc x -> do
    forname <- x >- hasName "fn" /> getText
    surname <- x >- withTraceLevel 5 traceTree >>> hasName "sn" /> getText
    returnA -< Person forname surname

I got

content of: 
============

---XTag "fn"
   |
   +---XText "John"


[]

So it seems to me, that the value of the input x changes between the two statements. It looks like the hasName "fn" was applied to x before it was attached to the surname arrow. But shall x not remain the same between the two lines?

回答1:

No, the input can't change and it doesn't.

What you've programmed in the lines

proc x -> do
  y <- x >- hasName "item"
  returnA -< x

is just a filter removing all nodes not named item. His is equivalent to the arrow

hasName "item" `guards` this

You can test this with

{-# LANGUAGE Arrows #-}
{-# LANGUAGE NoMonomorphismRestriction #-}

module Main where

import Text.XML.HXT.Core

parseXml0 :: IOSArrow XmlTree XmlTree
parseXml0 = getChildren >>> getChildren >>>
  proc x -> do
    _ <- hasName "item" -< x
    returnA -< x

parseXml1 :: IOSArrow XmlTree XmlTree
parseXml1 = getChildren >>> getChildren >>>
            (hasName "item" `guards` this)

main1 :: Show c => IOSArrow XmlTree c -> IO ()
main1 parseXml = do
    person <- runX (readString [withValidate no]
                    "<xml><item>John</item><item2>Smith</item2></xml>"
                    >>> parseXml)
    putStrLn $ show person
    return ()

main :: IO ()
main = main1 parseXml0 >> main1 parseXml1


回答2:

EDIT: OK, well now you've complete changed your question!

The working example should be interpreted as follows:

For the top level tag x

  • iterate over all the texts (getText) of grandchildren (this /> this) where the name is "fn" (hasName "fn"), using forname to hold these values
  • iterate over all the texts (getText) of grandchildren (this /> this) where the name is "sn" (hasName "sn"), using surname to hold these values
  • yield Person forname surname for each such pair

This looks like it works, but is perhaps not doing what you think it's doing. Try running the code on the input "<p><fn>John</fn><sn>Smith</sn><fn>Anne</fn><sn>Jones</sn></p>" for example. Four names are printed.

The broken example should be interpreted as follows:

For every grandchild x

  • if x has the name "fn" then store the text in forname (otherwise skip to the next x)
  • if x has the name "sn" then store the text in surname (otherwise skip to the next x)

A tag can't have the name "fn" and the name "sn"! Thus every tag is skipped.

Your investigation is just showing the point of the computation at which the tags are skipped. In the first case both the tags are present, as nothing has been filtered yet. In the second case only the "fn" tag is present because the first command has filtered everything else out.

EDIT: You may find this example (done in terms of the list monad) instructive.

import Control.Monad ((>=>))

data XML = Text String | Tag String [XML] deriving Show

this :: a -> [a]
this = return

(/>) :: (a -> [XML]) -> (XML -> [c]) -> a -> [c]
f /> g = f >=> getChildren >=> g

(>--) :: a -> (a -> b) -> b
x >-- f = f x

getChildren :: XML -> [XML]
getChildren (Text _) = []
getChildren (Tag _ c) = c

hasName :: String -> XML -> [XML]
hasName _ (Text _) = []
hasName n i@(Tag n' _) = if n == n' then [i] else []

getText :: XML -> [String]
getText (Text t) = [t]
getText (Tag _ _) = []

parseXML :: XML -> [(String, String)]
parseXML = \x -> do
  forname <- x >-- (this /> this /> hasName "fn" /> getText)
  surname <- x >-- (this /> this /> hasName "sn" /> getText)
  return (forname, surname)

parseXMLBroken :: XML -> [(String, String)]
parseXMLBroken = getChildren >=> getChildren >=> \x -> do
  forname <- x >-- (hasName "fn" /> getText)
  surname <- x >-- (hasName "sn" /> getText)
  return (forname, surname)

runX :: (XML -> a) -> XML -> a
runX f xml = f (Tag "/" [xml])

xml :: XML
xml = (Tag "p" [ Tag "fn" [Text "John"]
               , Tag "sn" [Text "Smith"] ])

example1 = runX parseXML xml

example2 = runX parseXMLBroken xml

*Main> example1
[("John","Smith")]
*Main> example2
[]