Scala convert XML to key value map

2019-08-19 09:55发布


Related to This topic

Problem is as follows, imagine an XML without any particular schema


For processing I want to have this in a Key Value Map:

"Persons/total" -> 2
"Persons/someguy/firstname" -> john
"Persons/someguy/name" -> Snow
"Persons/otherperson/sex" -> female

Preferably I have some nice recursive function where I traverse the XML code depth-first and simply stack all labels until I find a value and return that value together with the stack of labels. Unfortunately I am struggling to connect the return type with the input type as I return a Sequence of my input.. Let me show you what I have so far, clearly the foreach is a problem as this returns Unit, but the map would also not work as it returns a Seq.

def dfs(n: NodeSeq, keyStack: String, map: Map[String,String])
 :(NodeSeq, String, Map[String,String]) = {
  n.foreach(x => {
    if (x.child.isEmpty) {
      dfs(x.child, keyStack, map + (keyStack+ x.label + " " -> x.text))
    else {
      dfs(x.child, keyStack+ x.label + "/", map)

Would greatly appreciate the help!


After some playing around, this is the most elegant way in which I could do it. What I don't like is:

  • It goes depth-first for every child, so you need to flat out the result afterwards. This is also why I miss the root node label.
  • It drags a lot of XML along the way, so it might be too memory intensive?

Please improve if you have ideas!

import scala.xml._

val xml = "<persons><total>2</total><someguy><firstname>john</firstname><name>Snow</name></someguy><otherperson><sex>female</sex></otherperson></persons>"
val result: Elem = scala.xml.XML.loadString(xml)

def linearize(node: Node, stack: String, map: Map[String,String])
: List[(Node, String, Map[String,String])] = {
  (node, stack, map) :: node.child.flatMap {
    case e: Elem => {
      if (e.descendant.size == 1) linearize(e, stack, map ++ Map(stack + "/" + e.label -> e.text))
      else linearize(e, stack + "/" + e.label, map)
    case _ => Nil

linearize(result, "", Map[String,String]()).flatMap(_._3).toMap

We still need to flatten the Map afterwards but at least the recursive part is rather short. Code above should work in your Scala worksheet.


Inspired by Sparky's answer, but suitable even for more generalized case:

val emptyMap = Map.empty[String,List[String]]

def xml2map(xml: String): Map[String,List[String]] = add2map(XML.loadString(xml), "", emptyMap)

private def add2map(node: Node, xPath: String, oldMap: Map[String,List[String]]): Map[String,List[String]] = {

  val elems = node.child.filter(_.isInstanceOf[Elem])
  val xCurr = xPath + "/" + node.label

  val currElems = elems.filter(_.child.count(_.isInstanceOf[Elem]) == 0)
  val nextElems = elems.diff(currElems)

  val currMap = currElems.foldLeft(oldMap)((map, elem) => map + {
    val key = xCurr + "/" + elem.label

    val oldValue = map.getOrElse(key, List.empty[String])
    val newValue = oldValue ::: List(elem.text)

    key -> newValue

  nextElems.foldLeft(currMap)((map, elem) => map ++ add2map(elem, xCurr, emptyMap))

For XML like

    <alive>in 1st season</alive>
    <alive>in 2nd season</alive>
    <alive>even in last season</alive>
    <alive>how long more?</alive>

it generates a Map[String,List[String]] below (after .toString()):

  /persons/total -> List(2),
  /persons/someguy/firstname -> List(john),
  /persons/someguy/alive -> List(in 1st season, in 2nd season, ..., even in last season, how long more?),
  /persons/otherperson/sex -> List(female),
  /persons/someguy/name -> List(Snow)


One scenario to consider, is when an element has a prefix:

val xml = <a>

There are other scenarios to consider (including entities, comments, declarations), but this is a good place to start:

def nodeToMap(xml: Elem): Map[String, String] = {

  def nodeToMapWithPrefix(prefix: String, xml: Node): Map[String, String] = {
    val pathAndText = for {
      child <- xml.child
    } yield {
      child match {
        case e: Elem if e.prefix == null =>
          nodeToMapWithPrefix(s"$prefix/${e.label}", e)
        case e: Elem => 
          nodeToMapWithPrefix(s"$prefix/${e.prefix}:${e.label}", e)
        case t: Text => Map(prefix -> t.text)
        case er: EntityRef => Map(prefix -> er.text)
    pathAndText.foldLeft(Map.empty[String, String]){_ ++ _}

  nodeToMapWithPrefix(xml.label, xml)

Another scenario to consider is when text isn't at a leaf element:

val xml = <a>