UPDATE: I think I have answered most of this question now, except the handling of <pgBreak>
. you can see my updates and current XSLT
at the end of this post under the EDIT
I asked a similar question yesterday, and received good answers. However, I have since realized this didn't cover all my bases so I am asking a more detailed question today.
XML IN
<?xml version="1.0" encoding="UTF-8"?>
<root>
<pgBreak pgId="i"/>
<p xml:id="a-01">
<highlight rend="italic">Bacon ipsum dolor sit amet</highlight> bacon chuck pastrami swine pork rump, shoulder beef ribs doner tri-tip
tongue. Tri-tip ground round short ribs capicola meatloaf shank drumstick short loin pastrami t-
bone. Sirloin turducken short ribs t-bone andouille strip steak pork loin corned beef hamburger
bacon filet mignon pork chop tail.
<note.ref id="0001"><super>1</super></note.ref>
<note id="0001">
<p>
You may need to consult a <highlight rend="italic">latin</highlight> butcher. Good Luck.
</p>
</note>
Pork loin <pgBreak pgId="01"/> ribeye bacon pastrami drumstick sirloin, shoulder pig jowl. Salami brisket rump ham, tail
hamburger strip steak pig ham hock short ribs jerky shank beef spare ribs. Capicola short ribs swine
beef meatball jowl pork belly. Doner leberkas short ribs, flank chuck pancetta bresaola bacon ham
hock pork hamburger fatback.
</p>
<p xml:id="a-02">
Bacon ipsum dolor sit amet bacon chuck pastrami swine pork rump, shoulder beef ribs doner tri-tip
tongue. Tri-tip ground round short ribs capicola meatloaf shank drumstick short loin pastrami t-
bone. Sirloin turducken short ribs t-bone andouille strip steak pork loin corned beef hamburger
bacon filet mignon pork chop tail.
</p>
<p xml:id="a-03">
Bacon ipsum dolor sit amet bacon chuck pastrami swine pork rump, shoulder beef ribs doner tri-tip
tongue.
<quote>
<p> 1.
Tri-tip ground round short ribs capicola meatloaf shank drumstick short loin pastrami t-
bone. Sirloin turducken short ribs t-bone andouille strip steak pork loin corned beef hamburger
bacon filet mignon pork chop tail.
</p>
<p> 2.
Tri-tip ground round short ribs capicola meatloaf shank drumstick short loin pastrami t-
bone. Sirloin <pgBreak pgId="02"/>turducken short ribs t-bone andouille strip steak pork loin corned beef hamburger
bacon filet mignon pork chop tail.
</p>
<p> 3.
Tri-tip ground round short ribs capicola meatloaf shank drumstick short loin pastrami t-
bone. Sirloin turducken short ribs t-bone andouille strip steak pork loin corned beef hamburger
bacon filet mignon pork chop tail.
</p>
</quote>
</p>
</root>
HTML OUT
<!DOCTYPE HTML>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<title>Test</title>
</head>
<body>
<div id="pg-i">
Page i
</div>
<p data-chunkid="a-01">
<span class="highlight-italic">Bacon ipsum dolor sit amet</span>bacon chuck pastrami swine pork rump, shoulder beef ribs doner tri-tip
tongue. Tri-tip ground round short ribs capicola meatloaf shank drumstick short loin
pastrami t-
bone. Sirloin turducken short ribs t-bone andouille strip steak pork loin corned beef
hamburger
bacon filet mignon pork chop tail.
<span class="noteRef" id="0001"><sup>1</sup></span></p>
<div id="note-0001" data-chunkid="a-01">
<p>
You may need to consult a <span class="highlight-italic">latin</span> butcher. Good Luck.
</p>
</div>
<p data-chunkid="a-01">
Pork loin
</p>
<div id="pg-01">
Page 01
</div>
<p data-chunkId="a-01">
ribeye bacon pastrami drumstick sirloin, shoulder pig jowl. Salami brisket
rump ham, tail
hamburger strip steak pig ham hock short ribs jerky shank beef spare ribs. Capicola
short ribs swine
beef meatball jowl pork belly. Doner leberkas short ribs, flank chuck pancetta bresaola
bacon ham
hock pork hamburger fatback.
</p>
<p data-chunkid="a-02"><span class="highlight-italic">Bacon ipsum dolor sit</span> amet bacon chuck pastrami swine pork rump, shoulder beef ribs doner tri-tip
tongue. Tri-tip ground round short ribs capicola meatloaf shank drumstick short loin
pastrami t-
bone. Sirloin turducken short ribs <span class="highlight-bold">t-bone</span> andouille strip steak pork loin corned beef hamburger
bacon filet mignon pork chop tail.
</p>
<p data-chunkid="a-03">
Bacon ipsum dolor sit amet bacon chuck pastrami swine pork rump, shoulder beef ribs
doner tri-tip
tongue.
</p>
<blockquote data-chunkid="a-03">
<p> 1.
Tri-tip ground round short ribs capicola meatloaf shank drumstick short loin pastrami t-
bone. Sirloin turducken short ribs t-bone andouille strip steak pork loin corned beef hamburger
bacon filet mignon pork chop tail.
</p>
<p>2.
Tri-tip ground round <span class="highlight-italic">short ribs</span> capicola meatloaf shank drumstick short loin pastrami t-
bone. Sirloin
</p>
</blockquote>
<div id="pg-02">
Page: 02
</div>
<blockquote data-chunkid="a-03">
</p>
turducken short ribs t-bone andouille strip steak pork loin corned beef
hamburger bacon filet mignon pork chop tail.
</p>
<p> 3.
Tri-tip ground round short ribs capicola meatloaf shank drumstick short loin pastrami t-
bone. Sirloin turducken short ribs t-bone andouille strip steak pork loin corned beef hamburger
bacon filet mignon pork chop tail.
</p>
</blockquote>
<p data-chunkid="a-03">
Bacon ipsum dolor sit amet bacon chuck pastrami swine pork rump, shoulder beef ribs
doner tri-tip
tongue.
</p>
</body>
</html>
I would like to transform the xml to html5 but keep each chunk (xml:id) together. I want to avoid divits (overuse of divs) so wraping each p in a div is out, but I also am trying to avoid invalid HTML. for example it would be easy to take the parent p (xml:id=a-01) and wrap it aroud its descendants, however, a block level <div>
and another <p>
would be invalid, and the browser would intrepret everything after the end of the text as orphaned text.
I have tried various modified XSLT
s from my question from yesterday. However, I find myself in a bit of unfamiliar territory. I would also benefit a concise explanation of the solution so I can start to better understand XSLT, as it looks like I will be spending more time with it in the upcoming months. I should probably pick up book by Michael Kay or something.
EDIT: current version of the XSLT I am working with
note: I Haven't attempted the page breaks yet. Also, I cannot get the <meta>
tag to close....oxygen 14 keeps complaining about that.
<xsl:template match="/">
<html>
<body>
<xsl:apply-templates/>
</body>
</html>
</xsl:template>
<xsl:template match="p[not((parent::note,.//p, .//div))]">
<p data-chunkID="{@xml:id}">
<xsl:apply-templates/>
</p>
</xsl:template>
<xsl:template match="p[.//p, .//div]">
<xsl:for-each-group select="node()" group-adjacent="boolean((self::text(), self::note.ref,self::highlight))">
<xsl:choose>
<xsl:when test="current-grouping-key()">
<p data-chunkID="{../@xml:id}">
<xsl:apply-templates select="current-group()"/>
</p>
</xsl:when>
<xsl:when test="self::p">
<p>
<xsl:apply-templates/>
</p>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="current-group()"/>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each-group>
</xsl:template>
<xsl:template match="note.ref">
<span class="noteRef" id="{@id}">
<xsl:apply-templates/>
</span>
</xsl:template>
<xsl:template match="super">
<sup>
<xsl:apply-templates/>
</sup>
</xsl:template>
<xsl:template match="note">
<div id="note-{@id}" data-chunkID="{../@xml:id}">
<p>
<xsl:apply-templates/>
</p>
</div>
</xsl:template>
<xsl:template match="quote">
<blockquote data-chunkID="{../@xml:id}">
<p>
<xsl:apply-templates/>
</p>
</blockquote>
</xsl:template>
<xsl:template match="highlight">
<xsl:variable name="class" select="concat(name(.),'-',string(@rend))"/>
<xsl:choose>
<xsl:when test="@rend[.= 'italic']">
<span class="{$class}">
<xsl:apply-templates/>
</span>
</xsl:when>
<xsl:when test="@rend[.= 'bold']">
<span class="{$class}">
<xsl:apply-templates/>
</span>
</xsl:when>
<xsl:otherwise>
<span class="{$class}">
<xsl:apply-templates/>
</span>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
It looks like your input is a little bit inconsistent with your output. (Is that the expected output, or the output you're getting now)? Chunks a-02 and a-03 have no
<highlight>
elements in the input, yet the output has<span class="highlight...">
elements. Also, chunk a-03 has text duplicated after the blockquote.I believe I've produced a working solution that does everything in your example. Could you give this a try?
I believe the unclosed meta tags is a result of using
method="html"
. You may need to usemethod="xml"
to get closed meta tags. Withmethod="html"
, the above transform produces the following output from your sample input:By changing the method to "xml" and manually adding the
meta
element to the transform, you can obtain the same result, but with the following<head>