This is a follow-up to one of my previous posts.
I tried to understand why the RuleTransformer performance is so poor. Now I believe that it is so slow because its complexity is O(2n), where n is the height of the input XML tree.
Suppose I need to rename all labels of all elements to label "b":
import scala.xml._, scala.xml.transform._
val rule: RewriteRule = new RewriteRule() {
override def transform(node: Node): Seq[Node] = node match {
case e: Elem => e.copy(label = "b")
case other => other
}
}
def trans(node: Node): Node = new RuleTransformer(rule).apply(node)
Let's count how many times the transform
visits each node in input <a3><a2><a1/></a2></a3>
.
In order to count the visits we add a buffer visited
, init it in the beginning, store visited nodes, and print it in the end.
import scala.collection.mutable.ListBuffer
// buffer to store visited nodes
var visited: ListBuffer[Node] = ListBuffer[Node]()
val rule: RewriteRule = new RewriteRule() {
override def transform(n: Node): Seq[Node] = {
visited append (n) // count this visit
n match {
case e: Elem => e.copy(label = "b")
case other => other
}
}
}
def trans(node: Node): Node = {
visited = ListBuffer[Node]() // init the buffer
val r = new RuleTransformer(rule).apply(node)
// print visited nodes and numbers of visits
println(visited.groupBy(identity).mapValues(_.size).toSeq.sortBy(_._2))
r
}
Now let's run it in REPL and see the visited
scala> val a3 = <a3><a2><a1/></a2></a3>
a3: scala.xml.Elem = <a3><a2><a1/></a2></a3>
scala> trans(a3)
ArrayBuffer((<a3><b><b/></b></a3>,2), (<a2><b/></a2>,4), (<a1/>,8))
res1: scala.xml.Node = <b><b><b/></b></b>
So a1
is visited eight times.
If we transform <a4><a3><a2><a1/></a2></a3></a4>
then a1
will be visited 16 times, for <a5><a4><a3><a2><a1/></a2></a3></a4></a5>
-- 32, etc. So the complexity looks exponential.
Does it make sense ? How would you prove it by analysis of the code ?