Large XML file - wrap elements inside tags using X

2019-06-12 20:16发布

I have a fairly large XML file, around 3-4 MB and I need to wrap certain elements inside tags. My Xml has the following structure:

<body>
    <p></p>
    <p>
        <sectPr></sectPr>
    </p>
    <p></p>
    <p></p>
    <tbl></tbl>
    <p>
        <sectPr></sectPr>
    </p>
</body>

Of course, all the p and tbl elements will repeat themselves inside the body until the end of the file (also each of the elements presented above will have children - I just took them out for the sake of simplicity). As an estimate, I will have around 70 elements containing sectPr inside body, not necessarily in the order I described above.

What I would like to do, is to wrap all the elements that are starting from an element containing sectPr to the next element containing sectPr into another tag. As a result, my XML should look like this:

<body>
    <p></p>
    <myTag>
        <p>
            <sectPr></sectPr>
        </p>
        <p></p>
        <p></p>
        <tbl></tbl>
    </myTag>
    <myTag>
        <p>
            <sectPr></sectPr>
        </p>
    </myTag> 
</body>

Also, another requirement is that the operation must be performed under 40 seconds.

My question is: Do you think is possible to achieve this result using XSLT and if this is the case could please provide a short description on how can I do it, or do you think is better to read the XML file as String and then add the tags by manipulating the string?

Also, as programming language, I am using Visual Basic.

Thank you in advance.

标签: xml vb.net xslt
2条回答
爷、活的狠高调
2楼-- · 2019-06-12 20:36

This stylesheet would do. Though the efficiency is more when keys are used, I'm not sure how much time it would take for your file.

<xsl:stylesheet  xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output method="xml" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <!-- key to select following-sibling of current element containing sectPr, and preceding-sibling of the next element containing sectPr -->
    <xsl:key name="following-sectPr" match="*[not(self::*[sectPr])]" use="generate-id(preceding-sibling::*[sectPr][1])"/>

    <!-- Identity transform template to copy nodes and attributes -->
    <xsl:template match="node()|@*">
        <xsl:copy>
            <xsl:apply-templates select="node()|@*"/>
        </xsl:copy>
    </xsl:template>

    <!-- template to match the elements containing sectPr and add myTag to them and the elements matching the above declared key -->
    <xsl:template match="*[sectPr]">
        <myTag>
            <xsl:apply-templates select="current() | key('following-sectPr', generate-id())" mode="copy"/>
        </myTag>
    </xsl:template>

    <!-- template to do nothing for the elements with no sectPr but having a preceding-sibling elment containing sectPr -->
    <xsl:template match="*[not(sectPr) and preceding-sibling::*[sectPr]]"/>

    <!-- template to copy elements pushed by the template matching *[sectPr] -->
    <xsl:template match="*" mode="copy">
        <xsl:copy>
            <xsl:apply-templates select="node()|@*"/>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>
查看更多
该账号已被封号
3楼-- · 2019-06-12 20:39

another requirement is that the operation must be performed under 40 seconds.

Performance depends to a very large extent on the specific processor in use. If you are using MSXML, you may benefit significantly by using a so-called "sibling recursion" in this scenario - as shown recently by Dimitre Novatchev.

Try:

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
<xsl:output method="xml" version="1.0" encoding="utf-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<!-- identity transform -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="/body">
    <xsl:copy>
        <!-- start a "chain" for each leading node  -->
        <xsl:apply-templates select="*[1] | *[sectPr]"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="body/*[sectPr]" priority="1">
    <myTag>
        <xsl:copy>
            <xsl:apply-templates/>
        </xsl:copy>
        <!-- call the next sibling in chain  -->
        <xsl:apply-templates select="following-sibling::*[1][not(sectPr)]"/>
    </myTag>
</xsl:template>

<xsl:template match="body/*">
    <xsl:copy>
        <xsl:apply-templates/>
    </xsl:copy>
    <!-- call the next sibling in chain  -->
    <xsl:apply-templates select="following-sibling::*[1][not(sectPr)]"/>
</xsl:template>

</xsl:stylesheet>
查看更多
登录 后发表回答