Scala: XML Attribute parsing

2019-02-08 17:02发布

I'm trying to parse a rss feed that looks like this for the attribute "date":

<rss version="2.0">
<channel>
    <item>
        <y:c date="AA"></y:c>
    </item>
</channel>
</rss>

I tried several different versions of this: (rssFeed contains the RSS data)

println(((rssFeed \\ "channel" \\ "item" \ "y:c" \"date").toString))

But nothing seems to work. What am I missing?

Any help would really be appreciated!

4条回答
兄弟一词,经得起流年.
2楼-- · 2019-02-08 17:23

Think about using sequence comprehensions, too. They're useful for dealing with XML, particularly if you need complicated conditions.

For the simple case:

for {
  c <- rssFeed \\ "@date"
} yield c

Gives you the date attribute from everything in rssFeed.

But if you want something more complex:

val rssFeed = <rss version="2.0">
                <channel>
                  <item>
                    <y:c date="AA"></y:c>
                    <y:c date="AB"></y:c>
                    <y:c date="AC"></y:c>
                  </item>
                </channel>
              </rss>

val sep = "\n----\n"

for {
  channel <- rssFeed \ "channel"
  item <- channel \ "item"
  y <- item \ "c"
  date <- y \ "@date" if (date text).equals("AA")
} yield {
  val s = List(channel, item, y, date).mkString(sep)
  println(s)
}

Gives you:

    <channel>
                        <item>
                          <y:c date="AA"></y:c>
                          <y:c date="AB"></y:c>
                          <y:c date="AC"></y:c>
                        </item>
                      </channel>
    ----
    <item>
                          <y:c date="AA"></y:c>
                          <y:c date="AB"></y:c>
                          <y:c date="AC"></y:c>
                        </item>
    ----
    <y:c date="AA"></y:c>
    ----
    AA
查看更多
叛逆
3楼-- · 2019-02-08 17:26

Also, think about the difference between \ and \\. \\ looks for a descendent, not just a child, like this (note that it jumps from channel to c, without item):

scala> (rssFeed \\ "channel" \\ "c" \ "@date").text
res20: String = AA

Or this sort of thing if you just want all the < c > elements, and don't care about their parents:

scala> (rssFeed \\ "c" \ "@date").text            
res24: String = AA

And this specifies an exact path:

scala> (rssFeed \ "channel" \ "item" \ "c" \ "@date").text
res25: String = AA
查看更多
等我变得足够好
4楼-- · 2019-02-08 17:27

Attributes are retrieved using the "@attrName" selector. Thus, your selector should actually be something like the following:

println((rssFeed \\ "channel" \\ "item" \ "c" \ "@date").text)
查看更多
看我几分像从前
5楼-- · 2019-02-08 17:31

The "y" in <y:c is a namespace prefix. It's not part of the name. Also, attributes are referred to with a '@'. Try this:

println(((rssFeed \\ "channel" \\ "item" \ "c" \ "@date").toString))
查看更多
登录 后发表回答