Is complexity of scala.xml.RuleTransformer really

2020-03-23 18:36发布

问题:

This is a follow-up to one of my previous posts.

I tried to understand why the RuleTransformer performance is so poor. Now I believe that it is so slow because its complexity is O(2n), where n is the height of the input XML tree.

Suppose I need to rename all labels of all elements to label "b":

import scala.xml._, scala.xml.transform._

val rule: RewriteRule = new RewriteRule() {
  override def transform(node: Node): Seq[Node] = node match {
    case e: Elem => e.copy(label = "b")
    case other => other
  }
}

def trans(node: Node): Node = new RuleTransformer(rule).apply(node)

Let's count how many times the transform visits each node in input <a3><a2><a1/></a2></a3>.
In order to count the visits we add a buffer visited, init it in the beginning, store visited nodes, and print it in the end.

import scala.collection.mutable.ListBuffer

// buffer to store visited nodes
var visited: ListBuffer[Node] = ListBuffer[Node]()

val rule: RewriteRule = new RewriteRule() {
  override def transform(n: Node): Seq[Node] = {
    visited append (n) // count this visit
    n match {
      case e: Elem => e.copy(label = "b")
      case other => other
    }
  }
}

def trans(node: Node): Node = {
  visited = ListBuffer[Node]() // init the buffer
  val r = new RuleTransformer(rule).apply(node)
  // print visited nodes and numbers of visits 
  println(visited.groupBy(identity).mapValues(_.size).toSeq.sortBy(_._2))
  r
}

Now let's run it in REPL and see the visited

scala> val a3 = <a3><a2><a1/></a2></a3>
a3: scala.xml.Elem = <a3><a2><a1/></a2></a3>

scala> trans(a3)
ArrayBuffer((<a3><b><b/></b></a3>,2), (<a2><b/></a2>,4), (<a1/>,8))
res1: scala.xml.Node = <b><b><b/></b></b>

So a1 is visited eight times.

If we transform <a4><a3><a2><a1/></a2></a3></a4> then a1 will be visited 16 times, for <a5><a4><a3><a2><a1/></a2></a3></a4></a5> -- 32, etc. So the complexity looks exponential.

Does it make sense ? How would you prove it by analysis of the code ?

回答1:

This will be not a very rigours analysis, but the problem seems to be with the BasicTransformer's transform(Seq[Node]) method[1].

The child transform method will be called twice for a node which is changed. Specifically in your example all the nodes will be called twice for this reason. And you are right in that each node at height h will be called 2^(h-1) times. Also note that at most one child of a node will be called twice because use of span and code, in the specific example all the child of the nodes will be called twice.

Just to verify this wrote these code samples for modified RulesTransformer. ( I could have overriden RulesTransformer too. But anyway)

// This is same as library RuleTransformer added with print statement
class CopiedRuleTransformer(rules: RewriteRule*) extends BasicTransformer {
    override def transform(n: Node): Seq[Node] = {
        println(n)
        rules.foldLeft(super.transform(n)) { (res, rule) => rule transform res }
    }
}

My modified RuleTransformer

class MyRuleTransformer(rules: RewriteRule*) extends BasicTransformer {
    override def transform(n: Node): Seq[Node] = {
        println(n)
        rules.foldLeft(super.transform(n)) { (res, rule) => rule transform res }
    }  
    override def transform(ns:Seq[Node]): Seq[Node] = {
        ns.flatMap(n => transform(n))
    } 
}

These codes are only for demonstrating purpose. You can call these as

new CopiedRuleTransformer(rule).apply(node)

or

new MyRuleTransformer(rule).apply(node)

[1] line : 34 https://github.com/scala/scala-xml/blob/master/src/main/scala/scala/xml/transform/BasicTransformer.scala