linking by differentiating the type of content

2019-08-24 02:27发布

问题:

i've the below 2 xml cases, where in i want to link them, here i want to use XSLT 1.0, but not 2.0 with REs. here i'm trying to automate my script as much as i can, but i'm getting stuck in the places where there are any of the below cases. please let me know how do i solve it.

Case1:

<para>See Chapter 9. </para>

Case2:

<para>Many parties often come from different localities 
and therefore jurisdiction of arbitration can often be 
an issue for the arbitration commission to determine in 
Chapter 12. Enforcement of an award can also be another 
issue of concern.</para>

Case3:

<para>Distribution rights for products or services 
disputes are common in China. Many brand holders 
authorised other legal entities or individuals to use 
their brand names under certain conditions and 
restrictions see paras.2.213 and 4.214. The other 
legal entities/persons need to abide by the franchise 
agreement. Many disputes occur due to the lapse of the 
franchise agreement, misconduct of the franchisee in the 
execution of the franchise agreement, non-payment of 
franchise fee and infringement of the franchise rights. 
Arbitration has been used to resolve some of these 
disputes. </para>

Case4:

<para>The award must take into account the enforcement 
issue of the award(refer to para.3.12). The protection 
of properties may need to be addressed during arbitration
and parties may request protection of properties through 
the court system.</para>

case5:

<para>The claimant indicates that on 21 August 1998(as 
mentioned in Chapter 16) the claimant signed a franchise 
agreement with Beijing Y Company for Y Company to use their 
brand name(refer to para 5.12) to open a restaurant in 
Shanghai namely Shanghai X Famous Roast Duck Ltd</para>

The xslt should be in the below template. Here i'm taking para template just for applying templates, but in my code there are some conditions in it, just to cut them down, i took in that way.

    <xsl:template match="para">
<div class="para">
    <xsl:apply-templates/>
</div>
    </xsl:template>

    <xsl:template match="text()">
    <!-- Match should go here-->
    </xsl:template>

The expected outputs are as below.

Case1:

<div class="para">See <a href="CHA_CH_09">Chapter 9</a>. </div>

Case2:

<div class="para">Many parties often come from different localities 
and therefore jurisdiction of arbitration can often be 
an issue for the arbitration commission to determine in 
<a href="CHA_CH_12">Chapter 12</a>. Enforcement of an award can also be another 
issue of concern.</div>

Case3:

<div class="para">Distribution rights for products or services 
disputes are common in China. Many brand holders 
authorised other legal entities or individuals to use 
their brand names under certain conditions and 
restrictions see paras.<a href="CHA_CH_02/02-213">2.213</a> and<a href="CHA_CH_14/14-214">14.214</a>. The other legal 
entities/persons need to abide by the franchise 
agreement. Many disputes occur due to the lapse of the 
franchise agreement, misconduct of the franchisee in the 
execution of the franchise agreement, non-payment of 
franchise fee  and infringement of the franchise rights. 
Arbitration has been used to resolve some of these 
disputes.</div>

case 4:

<div class="para">The award must take into account the enforcement 
issue of the award(refer to para.<a href="CHA_CH_03/03-012">3.012</a>). The protection 
of properties may need to be addressed during arbitration 
and parties may request protection of properties through 
the court system.</div>

case5:

<div class="para">The claimant indicates that on 21 August 1998(as 
mentioned in <a href"CHA_CH_16">Chapter 16</a>) the claimant signed a franchise
agreement with Beijing Y Company for Y Company to use their 
brand name(refer to para <a href="CHA_CH_05/05-112">5.112</a>) to open a restaurant in 
Shanghai namely Shanghai X Famous Roast Duck Ltd</div>

Thanks.

回答1:

There are three possible problems involved here; it's not clear which of them is causing you difficulty.

First, there is the task of recognizing cross references appearing in certain forms. In a language with regular expressions, I believe you would simply be looking for the any of the following matches in your input:

Chapter \d+
para\.\d\.\d+ 
paras\.\d\.\d+ and \d\.\d+

The second one should possibly be para\.\d+\.\d+ (with a similar change to the third) -- only your data can say for sure. And if you're looking for patterns like the third one, you may also want patterns like paras\.\d\.\d+(, \d\.\d+)* and \d\.\d+ and para\.\d\.\d+ or \d\.\d+.

There are two things you can do to solve this problem. You can switch to XSLT 2.0 (why are you stuck in 1.0? given the difficulty 1.0 is going to cause you in solving this problem, the reason better be a good one). Or for each of these patterns you can hand-code a little recursive named template to recognize the pattern and return the string matched, or start- and length indices into the string value of the text node, or something else that lets you process the link. Hand-coding such pattern recognizers in XSLT 1.0 can be kind of relaxing, and it's really not that hard if you are able and willing to think about it in terms of an underlying finite state machine. (It also helps if you're not in a hurry.)

To recognize the first pattern, for instance, you need a state machine with an initial state, a second state in which the string "Chapter" has been seen, and third state in which the string "Chapter" and one or more decimal digits has been seen. In state 1, if we see "Chapter", we record our current position (we have matched 8 characters) and move to state 2; if we see anything else we return 0, to indicate that the input didn't match the pattern. In state 2, if we see a decimal digit we move to state 3 and advance one position in the input string, and if we see anything else we return 0 to signal failure. In state 3, if we see a decimal digit we advance one position in the input and stay in state 3; if we see anything else we signal success by returning a number indicating the length of the match (we know where it started; it started at the beginning of the string initially submitted to the template). My first draft (not tested) would look something like this:

<xsl:template name="match-chapter-ref">
  <!--* given a string s (in state 1), find out how long a string
      * is matched by the pattern "Chapter \d+" at the beginning
      * of $s.  Return that number.  Return 0 for no-match. *-->
  <xsl:param name="s" select="''"/>
  <xsl:param name="state" select="1"/>
  <xsl:param name="length-so-far" select="0"/>

  <xsl:choose>
    <!--* State 1:  expecting 'Chapter ' *-->
    <xsl:when test="$state = 1 and starts-with($s,'Chapter ')">
      <xsl:call-template name="match-chapter-ref">
        <xsl:with-param name="s" select="substring($s,9)"/>
        <xsl:with-param name="state" select="2"/>
        <xsl:with-param name="length-so-far" select="8"/>
      </xsl:call-template>
    </xsl:when>
    <xsl:when test="$state = 1 and not(starts-with($s,'Chapter '))">
      <!--* no sale, return 0 as value *-->
      <xsl:value-of select="0"/>
    </xsl:when>

    <!--* State 2:  expecting a decimal digit *-->
    <xsl:when test="$state = 2">
      <xsl:variable name="c" select="substring($s,1,1)"/>
      <xsl:variable name="litmus" select="translate($c,'01234567689','')"/>
      <xsl:choose>
        <xsl:when test="$litmus = ''">
          <!--* $c is a decimal digit *-->
          <xsl:call-template name="match-chapter-ref">
            <xsl:with-param name="s" select="substring($s,2)"/>
            <xsl:with-param name="state" select="3"/>
            <xsl:with-param name="length-so-far" select="$length-so-far + 1"/>
          </xsl:call-template>
        </xsl:when>
        <xsl:otherwise>
          <!--* no match, return 0 as value *-->
          <xsl:value-of select="0"/>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:when>

    <!--* State 3:  consuming further decimal digits *-->
    <xsl:when test="$state = 3">
      <xsl:variable name="c" select="substring($s,1,1)"/>
      <xsl:variable name="litmus" select="translate($c,'01234567689','')"/>
      <xsl:choose>
        <xsl:when test="$litmus = ''">
          <!--* $c is a decimal digit *-->
          <xsl:call-template name="match-chapter-ref">
            <xsl:with-param name="s" select="substring($s,2)"/>
            <xsl:with-param name="state" select="3"/>
            <xsl:with-param name="length-so-far" select="$length-so-far + 1"/>
          </xsl:call-template>
        </xsl:when>
        <xsl:otherwise>
          <!--* we have a match, return its length *-->
          <xsl:value-of select="$length-so-far"/>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:when>

    <xsl:otherwise>
      <xsl:message terminate="yes">Unexpected state in template 
        match-chapter-ref, bailing ...</xsl:message>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

You'll need similar templates to test for matches on the other patterns, and you'll need some rather fiddly code to handle text nodes by seeking the left-most possible match (that is, the earliest occurrence of "Chapter" or "paras" or "para"). In pseudo-code, the logic of that template will be:

  • Find the first match on 'Chapter ', 'para.', and 'paras.'.
  • Figure out which one is left-most.
  • Strip off the non-matching text before the left-most match and write it out.
  • Call the appropriate named template with the string beginning with the match.
  • If the named template returns a match-length of zero, strip off the misleading matching string, write it out to the output, and call the current template recursively with the remainder of the string at the left.
  • If the named template returns a match-length greater than zero, then strip off the text of the link ("Chapter 12", "para.3.245", etc.) and pass it to a named template to generate the hyperlink. Then recur to the current template with the remainder of the input string.

Your second problem is generating the hyperlinks. From your examples it looks like this may be a mechanical operation on the numeric part of the prose link; if it's not, you'll need a lookup table.

Your third problem is the normalization of link text. Your example shows that some links stay the same in your data, and some get changed. The link to Chapter 9, for example, remains a link to Chapter 9, and the link to paragraph 2.213 remains a link to paragraph 2.213. But the links to paragraphs 3.12 and 5.12 become links to paragraph 3.012 and 5.112, respectively. That doesn't look like an algorithmic process to me; if it's really part of your problem and not a typo in preparing your sample data, then good luck to you (and think about lookup tables).

And ... think again about why you want to do with in XSLT 1.0, where it requires a lot of tedious low-level string manipulation, instead of in 2.0, where it would be much more straightforward?



标签: xslt