i've the below 2 xml cases, where in i want to link them, here i want to use XSLT 1.0, but not 2.0 with REs. here i'm trying to automate my script as much as i can, but i'm getting stuck in the places where there are any of the below cases. please let me know how do i solve it.
Case1:
<para>See Chapter 9. </para>
Case2:
<para>Many parties often come from different localities
and therefore jurisdiction of arbitration can often be
an issue for the arbitration commission to determine in
Chapter 12. Enforcement of an award can also be another
issue of concern.</para>
Case3:
<para>Distribution rights for products or services
disputes are common in China. Many brand holders
authorised other legal entities or individuals to use
their brand names under certain conditions and
restrictions see paras.2.213 and 4.214. The other
legal entities/persons need to abide by the franchise
agreement. Many disputes occur due to the lapse of the
franchise agreement, misconduct of the franchisee in the
execution of the franchise agreement, non-payment of
franchise fee and infringement of the franchise rights.
Arbitration has been used to resolve some of these
disputes. </para>
Case4:
<para>The award must take into account the enforcement
issue of the award(refer to para.3.12). The protection
of properties may need to be addressed during arbitration
and parties may request protection of properties through
the court system.</para>
case5:
<para>The claimant indicates that on 21 August 1998(as
mentioned in Chapter 16) the claimant signed a franchise
agreement with Beijing Y Company for Y Company to use their
brand name(refer to para 5.12) to open a restaurant in
Shanghai namely Shanghai X Famous Roast Duck Ltd</para>
The xslt should be in the below template. Here i'm taking para template just for applying templates, but in my code there are some conditions in it, just to cut them down, i took in that way.
<xsl:template match="para">
<div class="para">
<xsl:apply-templates/>
</div>
</xsl:template>
<xsl:template match="text()">
<!-- Match should go here-->
</xsl:template>
The expected outputs are as below.
Case1:
<div class="para">See <a href="CHA_CH_09">Chapter 9</a>. </div>
Case2:
<div class="para">Many parties often come from different localities
and therefore jurisdiction of arbitration can often be
an issue for the arbitration commission to determine in
<a href="CHA_CH_12">Chapter 12</a>. Enforcement of an award can also be another
issue of concern.</div>
Case3:
<div class="para">Distribution rights for products or services
disputes are common in China. Many brand holders
authorised other legal entities or individuals to use
their brand names under certain conditions and
restrictions see paras.<a href="CHA_CH_02/02-213">2.213</a> and<a href="CHA_CH_14/14-214">14.214</a>. The other legal
entities/persons need to abide by the franchise
agreement. Many disputes occur due to the lapse of the
franchise agreement, misconduct of the franchisee in the
execution of the franchise agreement, non-payment of
franchise fee and infringement of the franchise rights.
Arbitration has been used to resolve some of these
disputes.</div>
case 4:
<div class="para">The award must take into account the enforcement
issue of the award(refer to para.<a href="CHA_CH_03/03-012">3.012</a>). The protection
of properties may need to be addressed during arbitration
and parties may request protection of properties through
the court system.</div>
case5:
<div class="para">The claimant indicates that on 21 August 1998(as
mentioned in <a href"CHA_CH_16">Chapter 16</a>) the claimant signed a franchise
agreement with Beijing Y Company for Y Company to use their
brand name(refer to para <a href="CHA_CH_05/05-112">5.112</a>) to open a restaurant in
Shanghai namely Shanghai X Famous Roast Duck Ltd</div>
Thanks.
There are three possible problems involved here; it's not clear which of them is causing you difficulty.
First, there is the task of recognizing cross references appearing in certain forms. In a language with regular expressions, I believe you would simply be looking for the any of the following matches in your input:
Chapter \d+
para\.\d\.\d+
paras\.\d\.\d+ and \d\.\d+
The second one should possibly be para\.\d+\.\d+
(with a similar change to the third) -- only your data can say for sure. And if you're looking for patterns like the third one, you may also want patterns like paras\.\d\.\d+(, \d\.\d+)* and \d\.\d+
and para\.\d\.\d+ or \d\.\d+
.
There are two things you can do to solve this problem. You can switch to XSLT 2.0 (why are you stuck in 1.0? given the difficulty 1.0 is going to cause you in solving this problem, the reason better be a good one). Or for each of these patterns you can hand-code a little recursive named template to recognize the pattern and return the string matched, or start- and length indices into the string value of the text node, or something else that lets you process the link. Hand-coding such pattern recognizers in XSLT 1.0 can be kind of relaxing, and it's really not that hard if you are able and willing to think about it in terms of an underlying finite state machine. (It also helps if you're not in a hurry.)
To recognize the first pattern, for instance, you need a state machine with an initial state, a second state in which the string "Chapter
" has been seen, and third state in which the string "Chapter
" and one or more decimal digits has been seen. In state 1, if we see "Chapter
", we record our current position (we have matched 8 characters) and move to state 2; if we see anything else we return 0, to indicate that the input didn't match the pattern. In state 2, if we see a decimal digit we move to state 3 and advance one position in the input string, and if we see anything else we return 0 to signal failure. In state 3, if we see a decimal digit we advance one position in the input and stay in state 3; if we see anything else we signal success by returning a number indicating the length of the match (we know where it started; it started at the beginning of the string initially submitted to the template). My first draft (not tested) would look something like this:
<xsl:template name="match-chapter-ref">
<!--* given a string s (in state 1), find out how long a string
* is matched by the pattern "Chapter \d+" at the beginning
* of $s. Return that number. Return 0 for no-match. *-->
<xsl:param name="s" select="''"/>
<xsl:param name="state" select="1"/>
<xsl:param name="length-so-far" select="0"/>
<xsl:choose>
<!--* State 1: expecting 'Chapter ' *-->
<xsl:when test="$state = 1 and starts-with($s,'Chapter ')">
<xsl:call-template name="match-chapter-ref">
<xsl:with-param name="s" select="substring($s,9)"/>
<xsl:with-param name="state" select="2"/>
<xsl:with-param name="length-so-far" select="8"/>
</xsl:call-template>
</xsl:when>
<xsl:when test="$state = 1 and not(starts-with($s,'Chapter '))">
<!--* no sale, return 0 as value *-->
<xsl:value-of select="0"/>
</xsl:when>
<!--* State 2: expecting a decimal digit *-->
<xsl:when test="$state = 2">
<xsl:variable name="c" select="substring($s,1,1)"/>
<xsl:variable name="litmus" select="translate($c,'01234567689','')"/>
<xsl:choose>
<xsl:when test="$litmus = ''">
<!--* $c is a decimal digit *-->
<xsl:call-template name="match-chapter-ref">
<xsl:with-param name="s" select="substring($s,2)"/>
<xsl:with-param name="state" select="3"/>
<xsl:with-param name="length-so-far" select="$length-so-far + 1"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<!--* no match, return 0 as value *-->
<xsl:value-of select="0"/>
</xsl:otherwise>
</xsl:choose>
</xsl:when>
<!--* State 3: consuming further decimal digits *-->
<xsl:when test="$state = 3">
<xsl:variable name="c" select="substring($s,1,1)"/>
<xsl:variable name="litmus" select="translate($c,'01234567689','')"/>
<xsl:choose>
<xsl:when test="$litmus = ''">
<!--* $c is a decimal digit *-->
<xsl:call-template name="match-chapter-ref">
<xsl:with-param name="s" select="substring($s,2)"/>
<xsl:with-param name="state" select="3"/>
<xsl:with-param name="length-so-far" select="$length-so-far + 1"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<!--* we have a match, return its length *-->
<xsl:value-of select="$length-so-far"/>
</xsl:otherwise>
</xsl:choose>
</xsl:when>
<xsl:otherwise>
<xsl:message terminate="yes">Unexpected state in template
match-chapter-ref, bailing ...</xsl:message>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
You'll need similar templates to test for matches on the other patterns, and you'll need some rather fiddly code to handle text nodes by seeking the left-most possible match (that is, the earliest occurrence of "Chapter
" or "paras
" or "para
"). In pseudo-code, the logic of that template will be:
- Find the first match on 'Chapter ', 'para.', and 'paras.'.
- Figure out which one is left-most.
- Strip off the non-matching text before the left-most match and write it out.
- Call the appropriate named template with the string beginning with the match.
- If the named template returns a match-length of zero, strip off the misleading matching string, write it out to the output, and call the current template recursively with the remainder of the string at the left.
- If the named template returns a match-length greater than zero, then strip off the text of the link ("Chapter 12", "para.3.245", etc.) and pass it to a named template to generate the hyperlink. Then recur to the current template with the remainder of the input string.
Your second problem is generating the hyperlinks. From your examples it looks like this may be a mechanical operation on the numeric part of the prose link; if it's not, you'll need a lookup table.
Your third problem is the normalization of link text. Your example shows that some links stay the same in your data, and some get changed. The link to Chapter 9, for example, remains a link to Chapter 9, and the link to paragraph 2.213 remains a link to paragraph 2.213. But the links to paragraphs 3.12 and 5.12 become links to paragraph 3.012 and 5.112, respectively. That doesn't look like an algorithmic process to me; if it's really part of your problem and not a typo in preparing your sample data, then good luck to you (and think about lookup tables).
And ... think again about why you want to do with in XSLT 1.0, where it requires a lot of tedious low-level string manipulation, instead of in 2.0, where it would be much more straightforward?