I need wrapping each word with a tag (e. span) in a HTML document, like:
<html>
<head>
<title>It doesnt matter</title>
</head>
<body>
<div> Text in a div </div>
<div>
Text in a div
<p>
Text inside a p
</p>
</div>
</body>
</html>
To result something like this:
<html>
<head>
<title>It doesnt matter</title>
</head>
<body>
<div> <span>Text </span> <span> in </span> <span> a </span> <span> div </span> </div>
<div>
<span>Text </span> <span> in </span> <span> a </span> <span> div </span>
<p>
<span>Text </span> <span> in </span> <span> a </span> <span> p </span>
</p>
</div>
</body>
</html>
It's important to keep the structure of the body...
Any help?
All of the three different solutions below use the XSLT design pattern of overriding the identity rule to generally preserve the structure and contents of the XML document, and only modify specific nodes.
I. XSLT 1.0 solution:
This short and simple transformation (no
<xsl:choose>
used anywhere):when applied to the provided XML document:
produces the wanted, correct result:
II. XSLT 2.0 solution:
when this transformation is applied to the same XML document (above), again the correct, wanted result is produced:
III Solution using FXSL:
Using the
str-split-to-words
template/function of FXSL one can easily implement much more complicated tokenization -- in any version of XSLT:Let's have a more complicated XML document and tokenization rules:
Here there is more than one delimiter that indicates the start or end of a word. In this particular example the delimiters can be:
" "
,";"
,"."
,":"
,"-"
,"["
,"]"
.The following transformation uses FXSL for this more complicated tokenization:
and produces the wanted, correct result:
You could achieve this by extending the identity transform to include a recursive template which checks for spaces in a piece of text, and if so puts a span tag around the first word. It can then recursively calls itself for the remaining portion of the text.
Here is it in action...
When called on your sample HTML, the output is as follows:
I wasn't 100% sure how important the spaces within the span elements are for you though.