Scala convert XML to key value map

2019-08-19 10:10发布

Related to This topic

Problem is as follows, imagine an XML without any particular schema

<persons>
  <total>2</total>
  <someguy>
     <firstname>john</firstname>
     <name>Snow</name>
  </someguy>
  <otherperson>
     <sex>female</sex>
  </otherperson>
</persons>

For processing I want to have this in a Key Value Map:

"Persons/total" -> 2
"Persons/someguy/firstname" -> john
"Persons/someguy/name" -> Snow
"Persons/otherperson/sex" -> female

Preferably I have some nice recursive function where I traverse the XML code depth-first and simply stack all labels until I find a value and return that value together with the stack of labels. Unfortunately I am struggling to connect the return type with the input type as I return a Sequence of my input.. Let me show you what I have so far, clearly the foreach is a problem as this returns Unit, but the map would also not work as it returns a Seq.

def dfs(n: NodeSeq, keyStack: String, map: Map[String,String])
 :(NodeSeq, String, Map[String,String]) = {
  n.foreach(x => {
    if (x.child.isEmpty) {
      dfs(x.child, keyStack, map + (keyStack+ x.label + " " -> x.text))
    }
    else {
      dfs(x.child, keyStack+ x.label + "/", map)
    }
  }
  )
}

Would greatly appreciate the help!

3条回答
淡お忘
2楼-- · 2019-08-19 10:36

One scenario to consider, is when an element has a prefix:

val xml = <a>
  <b>
    <c>1</c>
    <d>2</d>
    <e>
      <z:f>3</z:f>
    </e>
  </b>
</a>

There are other scenarios to consider (including entities, comments, declarations), but this is a good place to start:

def nodeToMap(xml: Elem): Map[String, String] = {

  def nodeToMapWithPrefix(prefix: String, xml: Node): Map[String, String] = {
    val pathAndText = for {
      child <- xml.child
    } yield {
      child match {
        case e: Elem if e.prefix == null =>
          nodeToMapWithPrefix(s"$prefix/${e.label}", e)
        case e: Elem => 
          nodeToMapWithPrefix(s"$prefix/${e.prefix}:${e.label}", e)
        case t: Text => Map(prefix -> t.text)
        case er: EntityRef => Map(prefix -> er.text)
      }
    }
    pathAndText.foldLeft(Map.empty[String, String]){_ ++ _}
  }

  nodeToMapWithPrefix(xml.label, xml)
}

Another scenario to consider is when text isn't at a leaf element:

val xml = <a>
  <b>text
    <c>1</c>
    <d>2</d>
  </b>
</a>
查看更多
Viruses.
3楼-- · 2019-08-19 10:38

After some playing around, this is the most elegant way in which I could do it. What I don't like is:

  • It goes depth-first for every child, so you need to flat out the result afterwards. This is also why I miss the root node label.
  • It drags a lot of XML along the way, so it might be too memory intensive?

Please improve if you have ideas!

import scala.xml._

val xml = "<persons><total>2</total><someguy><firstname>john</firstname><name>Snow</name></someguy><otherperson><sex>female</sex></otherperson></persons>"
val result: Elem = scala.xml.XML.loadString(xml)

def linearize(node: Node, stack: String, map: Map[String,String])
: List[(Node, String, Map[String,String])] = {
  (node, stack, map) :: node.child.flatMap {
    case e: Elem => {
      if (e.descendant.size == 1) linearize(e, stack, map ++ Map(stack + "/" + e.label -> e.text))
      else linearize(e, stack + "/" + e.label, map)
    }
    case _ => Nil
  }.toList
}

linearize(result, "", Map[String,String]()).flatMap(_._3).toMap

We still need to flatten the Map afterwards but at least the recursive part is rather short. Code above should work in your Scala worksheet.

查看更多
一夜七次
4楼-- · 2019-08-19 10:47

Inspired by Sparky's answer, but suitable even for more generalized case:

val emptyMap = Map.empty[String,List[String]]

def xml2map(xml: String): Map[String,List[String]] = add2map(XML.loadString(xml), "", emptyMap)

private def add2map(node: Node, xPath: String, oldMap: Map[String,List[String]]): Map[String,List[String]] = {

  val elems = node.child.filter(_.isInstanceOf[Elem])
  val xCurr = xPath + "/" + node.label

  val currElems = elems.filter(_.child.count(_.isInstanceOf[Elem]) == 0)
  val nextElems = elems.diff(currElems)

  val currMap = currElems.foldLeft(oldMap)((map, elem) => map + {
    val key = xCurr + "/" + elem.label

    val oldValue = map.getOrElse(key, List.empty[String])
    val newValue = oldValue ::: List(elem.text)

    key -> newValue
  })

  nextElems.foldLeft(currMap)((map, elem) => map ++ add2map(elem, xCurr, emptyMap))
}

For XML like

<persons>
  <total>2</total>
  <someguy>
    <firstname>john</firstname>
    <name>Snow</name>
    <alive>in 1st season</alive>
    <alive>in 2nd season</alive>
    <alive>...</alive>
    <alive>even in last season</alive>
    <alive>how long more?</alive>
  </someguy>
  <otherperson>
    <sex>female</sex>
  </otherperson>
</persons>

it generates a Map[String,List[String]] below (after .toString()):

Map(
  /persons/total -> List(2),
  /persons/someguy/firstname -> List(john),
  /persons/someguy/alive -> List(in 1st season, in 2nd season, ..., even in last season, how long more?),
  /persons/otherperson/sex -> List(female),
  /persons/someguy/name -> List(Snow)
)
查看更多
登录 后发表回答