XSL - create well formed xml from text file

2019-07-30 03:29发布

I have a pipe delimited text file as shown below, which I need to transform into a well formed xml structure (example shown below) using xsl. The xsl below is my (latest) attempt at solving this - however I cannot seem to find a way to encapsulate the level 002 elements in level 001, i.e. maintain the parent-child relationship, when iterating through the file line by line. Could anyone help here ?

Pipe delimited file - input

001|XXX|YYY
002|AAA|BBB
002|CCC|DD
001|EEF|XXX
002|HHH|GGG

XML File - desired output

<root>
   <level001>
            <elem name="field1">001</elem>
            <elem name="field2">XXX</elem>
            <elem name="field3">YYY</elem>
            <level002>
                           <elem name="field1">002</elem>
                           <elem name="field2">AAA</elem>
                           <elem name="field3">BBB</elem>
             </level002>
             <level002>
                        <elem name="field1">002</elem>
                        <elem name="field2">CCC</elem>
                        <elem name="field3">DD</elem>
              </level002>
    </level001>
    <level001>
                 <elem name="field1">001</elem>
                 <elem name="field2">XXX</elem>
                <elem name="field3">YYY</elem>
                <level002>
                         <elem name="field1">002</elem>
                         <elem name="field2">HHH</elem>
                         <elem name="field3">GG</elem>
               </level002>
    </level001>
</root>

Current XSL

<xsl:variable name="Cols">
<col>field1,1</col>
<col>field2,2</col>
<col>field3,3</col> 
</xsl:variable>


 <xsl:template match="/" name="main">
<xsl:choose>
    <xsl:when test="unparsed-text-available($pathToCSV, $encoding)">
       <xsl:variable name="csv" select="unparsed-text($pathToCSV, $encoding)" />
       <xsl:variable name="lines" select="tokenize($csv, '\n')" as="xs:string+" />
       <root>
       <xsl:for-each select="$lines[position() &gt; 0]">
        <xsl:if test="translate(., '&#160; &#9;&#10;&#13;',  '') != ''">
            <level001>
            <xsl:variable name="line" select="." />
            <xsl:variable name="columns" select="tokenize(.,'\|')" as="xs:string+"/>    
            <xsl:choose>
                <xsl:when test="$columns[1]='001'">
                    <xsl:for-each select="$Cols/col">
                        <xsl:variable name="column" select="number(substring-after(.,','))"/>
                        <elem name="{substring-before(.,',')}">
                            <!-- trims the whitespace from the beginning and the ending of the value -->
                            <xsl:value-of select="replace(replace($columns[$column],'\s+$',''),'^\s+','')"/>
                        </elem>
                    </xsl:for-each>
                </xsl:when>
                <xsl:when test="$columns[1]='002'">
                    <level002>
                    <xsl:for-each select="$Cols/col">
                        <xsl:variable name="column" select="number(substring-after(.,','))"/>
                        <elem name="{substring-before(.,',')}">
                            <!-- trims the whitespace from the beginning and the ending of the value -->
                            <xsl:value-of select="replace(replace($columns[$column],'\s+$',''),'^\s+','')"/>
                        </elem>
                    </xsl:for-each>
                    </level002>
                </xsl:when>
            </xsl:choose>                               
            </level001>
        </xsl:if>
       </xsl:for-each>
       </root>
    </xsl:when>         
</xsl:choose>

3条回答
做自己的国王
2楼-- · 2019-07-30 03:35

You can find a solution to essentially the same problem here:

http://www.saxonica.com/papers/ideadb-1.1/mhk-paper.xml

The core is a recursive grouping template:

<xsl:template name="process-level">
  <xsl:param name="population" required="yes" as="element()*"/>
  <xsl:param name="level" required="yes" as="xs:integer"/>
  <xsl:for-each-group select="$population" 
       group-starting-with="*[xs:integer(@level) eq $level]">
    <xsl:element name="{@tag}">
      <xsl:copy-of select="@ID[string(.)], @REF[string(.)]"/>
      <xsl:value-of select="normalize-space(@text)"/>
      <xsl:call-template name="process-level">
        <xsl:with-param name="population" 
                        select="current-group()[position() != 1]"/>
        <xsl:with-param name="level" 
                        select="$level + 1"/>
      </xsl:call-template>
    </xsl:element>
  </xsl:for-each-group>
</xsl:template>
查看更多
一夜七次
3楼-- · 2019-07-30 03:53

Well, you're iterating over every line and already closing the level001 tag when finished with the line. Why not try something like (pseudo-code):

  • for each line
  • if line is level001
  • print <level001>
  • get index of next level001
    • for each level002 between this line and the next level001 line
    • print <level002>
    • print body of level002
    • print </level002>
  • print </level001>
查看更多
劫难
4楼-- · 2019-07-30 03:55

I would first transform the flat text into a flat XML structure and then group that with for-each-group group-starting-with, as in the following code sample:

<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:mf="http://example.com/mf"
  exclude-result-prefixes="mf xs"
  version="2.0">

<xsl:param name="text-url" as="xs:string" select="'test2012090401.txt'"/>
<xsl:param name="sep" as="xs:string" select="'\|'"/>
<xsl:param name="field" as="xs:string" select="'field'"/>

<xsl:output indent="yes"/>

<xsl:function name="mf:group" as="node()*">
  <xsl:param name="nodes" as="node()*"/>
  <xsl:param name="level" as="xs:integer"/>
  <xsl:for-each-group select="$nodes" group-starting-with="line[xs:integer(elem[1]) eq $level]">
    <xsl:element name="level{*[1]}">
      <xsl:copy-of select="*"/>
      <xsl:sequence select="mf:group(current-group() except ., $level + 1)"/>
    </xsl:element>
  </xsl:for-each-group>
</xsl:function>

<xsl:template name="main">
  <xsl:variable name="flat">
    <xsl:for-each select="tokenize(unparsed-text($text-url), '\r?\n')">
      <line>
        <xsl:for-each select="tokenize(., $sep)">
          <elem name="{$field}{position()}">
            <xsl:value-of select="."/>
          </elem>
        </xsl:for-each>
      </line>
    </xsl:for-each>
  </xsl:variable>
  <root>
    <xsl:sequence select="mf:group($flat/line, 1)"/>
  </root>
</xsl:template>

</xsl:stylesheet>

When I apply that stylesheet with Saxon 9 using java -jar saxon9he.jar -it:main -xsl:sheet.xsl, the result I get is

<?xml version="1.0" encoding="UTF-8"?>
<root>
   <level001>
      <elem name="field1">001</elem>
      <elem name="field2">XXX</elem>
      <elem name="field3">YYY</elem>
      <level002>
         <elem name="field1">002</elem>
         <elem name="field2">AAA</elem>
         <elem name="field3">BBB</elem>
      </level002>
      <level002>
         <elem name="field1">002</elem>
         <elem name="field2">CCC</elem>
         <elem name="field3">DD</elem>
      </level002>
   </level001>
   <level001>
      <elem name="field1">001</elem>
      <elem name="field2">EEF</elem>
      <elem name="field3">XXX</elem>
      <level002>
         <elem name="field1">002</elem>
         <elem name="field2">HHH</elem>
         <elem name="field3">GGG</elem>
         <level/>
      </level002>
   </level001>
</root>

The stylesheet has a parameter named text-url to the plain text file you can set when running the stylesheet.

查看更多
登录 后发表回答