Cocke–Younger–Kasami (CYK) algorithm in the XSLT a

2019-09-01 16:41发布

Any idea about CYK algorithm XSLT please have a look on the link below:

There are two input xml like below I have to pass sentance.xml in the xslt and then based on the words in each sentance I have to read the values at runtime from the Rule.xml file and then produce the new XML given below.

Only using XSLT, XPath and XML no any other language or keywords.

http://en.wikipedia.org/wiki/CYK_algorithm

1) sentance.xml

<?xml version="1.0" encoding="UTF-8"?>
<sentances>
  <s>dog bark</s>
  <s>cat drink milk</s>
</sentances>

1) sentance.xml

<?xml version="1.0" encoding="UTF-8"?>
<allrules>
<rules>
    <rule cat="s">
        <rulechild cat="np"/>
        <rulechild cat="vp"/>
    </rule>
    <rule cat="vp">
        <rulechild cat="vt"/>
        <rulechild cat="np"/>
    </rule>
    <rule cat="vp">
        <rulechild cat="vi"/>
    </rule> 
</rules>
<words>
    <word cat="vi">bark</word>
    <word cat="vt">drink</word>
    <word cat="pn">dog</word>
    <word cat="pn">cat</word>
    <word cat="pn">milk</word>
</words>
</allrules>

OutPut XML should be like below:

<trees>
<tree>
    <sentace>dog bark</sentace>
    <node cat="s">
        <node cat="np">
            <word cat="pn">dog</word>
        </node>
        <node cat="vp">
            <word cat="vi">bark</word>
        </node>
    </node>
</tree>
<tree>
    <sentace>cat drink milk</sentace>
    <node cat="s">
        <node cat="np">
            <word cat="pn">cat</word>
        </node>
        <node cat="vp">
            <word cat="vt">drink</word>
            <node cat="np">
                <word cat="pn">milk</word>
            </node>
        </node>
    </node>
</tree>

Could it be possible to implement the CYK algorithm and produce the above out using XSLT Someone please help on that...

标签: xml xslt xpath
1条回答
虎瘦雄心在
2楼-- · 2019-09-01 16:48

Here is a solution should be close to what you have asked for. You did not specify the XSLT version. I have embedded the rules and symbols in the stylesheet, but you could easily adjust to make them an external document.

If XSLT 3.0 is not available to you, you could replace the fold-left() with tail-end recursion.

This input document ...

<sentences>
  <sentence>dog bark</sentence>
  <sentence>cat drink milk</sentence>
</sentences>

... when fed to this XSLT 3.0 stylesheet ...

<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:fn="http://www.w3.org/2005/xpath-functions"
  xmlns:so="http://stackoverflow.com/questions/33340967"
  version="3.0"
  exclude-result-prefixes="xsl xs fn so">

<xsl:output encoding="utf-8" omit-xml-declaration="yes" indent="yes" />
<xsl:strip-space elements="*" />

<xsl:variable name="rules" as="element(rule)*">
    <rule cat="s">
        <!-- All rules have precisely 2 children. -->
        <rulechild cat="np"/>
        <rulechild cat="vp"/>
    </rule>
    <rule cat="vp">
        <rulechild cat="vt"/>
        <rulechild cat="np"/>
    </rule>
</xsl:variable>

<xsl:variable name="words" as="element(word)+">
    <word cat="vi">bark</word>
    <word cat="vp">bark</word>
    <word cat="vt">drink</word>
    <word cat="np">dog</word>
    <word cat="np">cat</word>
    <word cat="np">milk</word>
</xsl:variable>

<!--
  The n'th analysis contains the CYK analysis for symbol sequences of length n.
  Let their be s symbols in the sentence.
  analysis[1] has s children.
  analysis[s] has one child.
  analysis[n] has s - n + 1 children
  The children of analysis are node and only node.
  node element represents a node in CYK analysis. This can either be a word or a string of symbols.
  The index of the node within its parent analysis corresponds to the start symbol.
    This index is equal to the index of the word within $words, of the starting word.
  node has any number of children, but the only type this can be is permutation.
  permutation represents a possible value for the node content, the competing alternatives
   being all the sibling permutations. Thus if a node has no permutations, there is no
   possiblity of a sequence of the given length being a correct grammar at that position
   in the sentence.
  Each permuation either has as children: 1 word; or 2 nodes.
  The permuations in the first row (analysis[1] are all of the word type.
  Subsequent rows have permutations of any type.
  words and permutations all have an attribute cat, which is the symbol.  
-->

<xsl:function name="so:analysis-1" as="element(analysis)"> 
  <!-- Do the first row of CYK. -->
  <xsl:param name="sentence" as="xs:string" />
  <analysis>
    <xsl:analyze-string select="$sentence" regex="\w+">
      <xsl:matching-substring>
        <xsl:variable name="word" select="." />
        <node>
          <xsl:for-each select="$words[. eq $word]">
            <permutation cat="{@cat}"> 
              <word cat="{@cat}"><xsl:value-of select="$word" /></word>
            </permutation>
          </xsl:for-each>
        </node>
      </xsl:matching-substring>
    </xsl:analyze-string>
  </analysis>
</xsl:function>

<xsl:function name="so:next-analysis" as="element(analysis)"> 
  <!-- Given the first n rows of CYK, compute the n+1'th row. -->
  <xsl:param name="rows" as="element(analysis)+" />
  <xsl:variable name="word-count" select="count( $rows[1]/node)" as="xs:integer" />
  <xsl:variable name="node-count" select="count( $rows[last()]/node) - 1" as="xs:integer" />
  <xsl:variable name="seq-len"    select="$word-count - $node-count + 1" as="xs:integer" />
  <analysis>
    <xsl:for-each select="1 to $node-count">
      <xsl:variable name="index" select="." as="xs:integer" />
      <node>
        <xsl:for-each select="$rules">
          <xsl:variable name="rule" as="element(rule)" select="." />
          <xsl:for-each select="
            for $sub-a in 1 to $seq-len - 1 return $sub-a
                [$rows[$sub-a           ]/node[$index         ][permutation/@cat = $rule/rulechild[1]/@cat]]
                [$rows[$seq-len - $sub-a]/node[$index + $sub-a][permutation/@cat = $rule/rulechild[2]/@cat]]">            
            <xsl:variable name="sub-a"    select="." as="xs:integer" />
            <permutation cat="{$rule/@cat}">
              <node>
                <xsl:copy-of select="$rows[$sub-a]/node[$index]/permutation[@cat eq $rule/rulechild[1]/@cat]" />
              </node>
              <node>
                <xsl:copy-of select="$rows[$seq-len - $sub-a]/node[$index + $sub-a]/permutation[@cat eq $rule/rulechild[2]/@cat]" />
              </node>
            </permutation>
          </xsl:for-each>     
        </xsl:for-each>   
      </node>
    </xsl:for-each>
  </analysis>
</xsl:function>

<xsl:template match="@*|node()">
  <xsl:copy>
    <xsl:apply-templates />
  </xsl:copy>
</xsl:template>

<xsl:template match="sentences">
  <trees>
    <xsl:apply-templates />
  </trees>
</xsl:template>

<xsl:template match="sentence">
  <tree>
    <xsl:variable name="first-row"  select="so:analysis-1(.)" />
    <xsl:variable name="word-count" select="count( $first-row/node)" as="xs:integer" />
    <xsl:sequence select="fold-left( 2 to $word-count, $first-row, function($a, $b) { $a, so:next-analysis(a) })
      [last()]" />
  </tree>
</xsl:template>

</xsl:stylesheet>

... will yield this output ...

<trees>
   <tree>
      <analysis>
         <node>
            <permutation cat="s">
               <node>
                  <permutation cat="np">
                     <word cat="np">dog</word>
                  </permutation>
               </node>
               <node>
                  <permutation cat="vp">
                     <word cat="vp">bark</word>
                  </permutation>
               </node>
            </permutation>
         </node>
      </analysis>
   </tree>
   <tree>
      <analysis>
         <node>
            <permutation cat="s">
               <node>
                  <permutation cat="np">
                     <word cat="np">cat</word>
                  </permutation>
               </node>
               <node>
                  <permutation cat="vp">
                     <node>
                        <permutation cat="vt">
                           <word cat="vt">drink</word>
                        </permutation>
                     </node>
                     <node>
                        <permutation cat="np">
                           <word cat="np">milk</word>
                        </permutation>
                     </node>
                  </permutation>
               </node>
            </permutation>
         </node>
      </analysis>
   </tree>   
</trees>

Caveat Emptor

I have not tested this.

Alternative

If you are not interested in all permutations, and just want any (the first) permutation, then we can add a few more templates and strip out all the permutations but one.

<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:fn="http://www.w3.org/2005/xpath-functions"
  xmlns:so="http://stackoverflow.com/questions/33340967"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  version="3.0"
  exclude-result-prefixes="xsl xs fn so">

<xsl:output encoding="utf-8" omit-xml-declaration="yes" indent="yes" />
<xsl:strip-space elements="*" />

<xsl:variable name="rules" as="element(rule)*">
    <rule cat="s">
        <!-- All rules have precisely 2 children. -->
        <rulechild cat="np"/>
        <rulechild cat="vp"/>
    </rule>
    <rule cat="vp">
        <rulechild cat="vt"/>
        <rulechild cat="np"/>
    </rule>
</xsl:variable>

<xsl:variable name="words" as="element(word)+">
    <word cat="vi">bark</word>
    <word cat="vp">bark</word>
    <word cat="vt">drink</word>
    <word cat="np">dog</word>
    <word cat="np">cat</word>
    <word cat="np">milk</word>
</xsl:variable>

<xsl:function name="so:analysis-1" as="element(analysis)"> 
  <!-- Do the first row of CYK. -->
  <xsl:param name="sentence" as="xs:string" />
  <analysis>
    <xsl:analyze-string select="$sentence" regex="\w+">
      <xsl:matching-substring>
        <xsl:variable name="word" select="." />
        <node>
          <xsl:for-each select="$words[. eq $word]">
            <permutation cat="{@cat}"> 
              <word cat="{@cat}"><xsl:value-of select="$word" /></word>
            </permutation>
          </xsl:for-each>
        </node>
      </xsl:matching-substring>
    </xsl:analyze-string>
  </analysis>
</xsl:function>

<xsl:function name="so:next-analysis" as="element(analysis)"> 
  <!-- Given the first n rows of CYK, compute the n+1'th row. -->
  <xsl:param name="rows" as="element(analysis)+" />
  <xsl:variable name="word-count" select="count( $rows[1]/node)" as="xs:integer" />
  <xsl:variable name="node-count" select="count( $rows[last()]/node) - 1" as="xs:integer" />
  <xsl:variable name="seq-len"    select="$word-count - $node-count + 1" as="xs:integer" />
  <analysis>
    <xsl:for-each select="1 to $node-count">
      <xsl:variable name="index" select="." as="xs:integer" />
      <node>
        <xsl:for-each select="$rules">
          <xsl:variable name="rule" as="element(rule)" select="." />
          <xsl:for-each select="
            for $sub-a in 1 to $seq-len - 1 return $sub-a
                [$rows[$sub-a           ]/node[$index         ][permutation/@cat = $rule/rulechild[1]/@cat]]
                [$rows[$seq-len - $sub-a]/node[$index + $sub-a][permutation/@cat = $rule/rulechild[2]/@cat]]">            
            <xsl:variable name="sub-a"    select="." as="xs:integer" />
            <permutation cat="{$rule/@cat}">
              <node>
                <xsl:copy-of select="$rows[$sub-a]/node[$index]/permutation[@cat eq $rule/rulechild[1]/@cat]" />
              </node>
              <node>
                <xsl:copy-of select="$rows[$seq-len - $sub-a]/node[$index + $sub-a]/permutation[@cat eq $rule/rulechild[2]/@cat]" />
              </node>
            </permutation>
          </xsl:for-each>     
        </xsl:for-each>   
      </node>
    </xsl:for-each>
  </analysis>
</xsl:function>

<xsl:template match="@*|node()">
  <xsl:copy>
    <xsl:apply-templates />
  </xsl:copy>
</xsl:template>

<xsl:template match="sentences">
  <trees>
    <xsl:apply-templates />
  </trees>
</xsl:template>

<xsl:template match="sentence">
  <tree>
    <xsl:variable name="first-row"  select="so:analysis-1(.)" />
    <xsl:apply-templates select="
       fold-left(
          2 to count( $first-row/node),
          $first-row,
          function($a, $b) { $a, so:next-analysis(a) })
       [last()]" />
  </tree>
</xsl:template>

<xsl:template match="analysis">
  <xsl:apply-templates />
</xsl:template>

<xsl:template match="node[not( fn:empty(*))]">
    <node cat="{permutation[1]/@cat}">
        <xsl:apply-templates select="permutation[1]/*"/>
    </node>
</xsl:template>

<xsl:template match="node[fn:empty(*)]">
    <node xsi:nil="true" />
</xsl:template>

</xsl:stylesheet>

Alternative output

The output should then be tidier, and look something like this ...

<trees xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
   <tree>
      <node cat="s">
         <node cat="np">
            <word cat="np">dog</word>
         </node>
         <node cat="vp">
            <word cat="vp">bark</word>
         </node>
      </node>
   </tree>
   <tree>
      <node cat="s">
         <node cat="np">
            <word cat="np">cat</word>
         </node>
         <node cat="vp">
            <node cat="vt">
               <word cat="vt">drink</word>
            </node>
            <node cat="np">
               <word cat="np">milk</word>
            </node>
         </node>
      </node>
   </tree>   
</trees>
查看更多
登录 后发表回答