When editing rich text content, our CMS generates XML-files with duplicate <br/>
-tags. I'd like to remove them in order to generate output that can be read by another application that does not appreciate the occurrence of those duplicates.
Example input:
<p>
Lorem ipsum...<br />
<br />
..dolor sit
</p>
Would generate something like this:
<p>
Lorem ipsum...<br />
..dolor sit
</p>
I am already using XSLT to manipulate the output in some other ways, and have found some examples of regexps and PHP that does the same thing, I just think it would be better if I could do this with XSLT due to the speed of the engine in our CMS (Roxen).
Thanks in advance!
Building off @Nic's answer, you could use
I've just changed
*
tonode()
. This would solve the problem of conflating two<br/>
s that have text in between. However it would stop removing duplicate<br/>
s even if there is only a whitespace node in between.To solve that...
Deprecated
At first I had suggested you could strip whitespace-only nodes from
p
elements in the input doc, by putting this at the top level of your XSLT:But @Alejandro pointed out that this could easily cause you to lose important spaces, as in
<p><em>bar</em> <em>baz</em></p>
.So instead,
use this modified match pattern:
Kind of ugly but it should work. This will match and suppress "any br for which the preceding sibling node that is not a whitespace-only text node is also a br." :-)
Given that the match pattern is so complex, you may prefer to move some of that logic into the template body, as follows. I guess this is more a matter of personal taste and style:
Here we use a copy of the identity transform when the
<br />
is not one we want to suppress. I don't think<br />
can take child elements or text, but it doesn't hurt to be safe.(Updated the above. I had forgotten to finish that sample code last time I saved edits.)
Using an identity transform to leave everything else alone, you could simply suppress every
<br/>
that is directly preceded by another one. Obviously, you can then just fit the template into your existing XSLT.The empty template will simply suppress that
<br/>
.Update: As @LarsH points out, that template is too liberal in its matching and probably should be something like: