Remove root node containing X

2019-08-22 16:07发布

问题:

<manga>
    <manga_mangadb_id>36037</manga_mangadb_id>
    <manga_title><![CDATA["Bungaku Shoujo" to Ue Kawaku Ghost]]></manga_title>
    <manga_volumes>4</manga_volumes>
    <manga_chapters>30</manga_chapters>
    <my_status>Dropped</my_status>
    <my_comments><![CDATA[]]></my_comments>
    <my_tags><![CDATA[Drama, Romance, Shounen, Psychological]]></my_tags>   
</manga>

My .XML file contains 14000 lines and the value <my_status>Dropped</my_status> appears 125 times. I want to delete the root node and everything in it if it contains <my_status>Dropped</my_status> . Is there a way to batch remove it or is doing it by hand the only way?

回答1:

Consider running XSLT, the special-purpose language designed to transform XML files such as removing nodes based on certain conditions. Specifically, run the identity transform to copy document as is and an empty template to remove needed element conditionally.

You can run XSLT 1.0 scripts in almost any general-purpose language such as C#, Java, Python, PHP, even VBA much like the other special-purpose language (SQL). Additionally, dedicated, standalone tools are available to even run XSLT 2.0 and 3.0. See tag page here.

XSLT (save as .xsl file, a special .xml file)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:strip-space elements="*"/>

  <!-- Identity Transform -->    
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <!-- Empty Template to Remove Elements -->   
  <xsl:template match="manga[my_status='Dropped']"/>

</xsl:stylesheet>

Below are command line tools available to run XSLT depending on OS.

Unix (Mac/Linux) using xsltproc, will output new transformed xml

xsltproc -o /path/to/output.xml /path/to/XSLTScript.xsl /path/to/input.xml

Windows using PowerShell script calling NET's System.Xml.Xsl.XslCompiledTransform class

Save below as a .ps1 script

param ($xml, $xsl, $output)

if (-not $xml -or -not $xsl -or -not $output) {
    Write-Host "& .\xslt.ps1 [-xml] xml-input [-xsl] xsl-input [-output] transform-output"
    exit;
}

trap [Exception]{
    Write-Host $_.Exception;
}

$xslt = New-Object System.Xml.Xsl.XslCompiledTransform;

$xslt.Load($xsl);
$xslt.Transform($xml, $output);

Write-Host "generated" $output;

Read-Host -Prompt "Press Enter to exit";

Command line call (will output new, transformed XML file)

Powershell.exe -File "C:\Path\To\PowerShell\Script.ps1"^
 "C:\Path\To\Input.xml" "C:\Path\To\XSLTScript.xsl" "C:\Path\To\Ouput.xml"


回答2:

You can achieve that with an XSLT-processor (version 1.0 and up) using an empty template in combination with an identity template. Use the following XSLT template to remove all <manga> elements which have a <my_status> element with the value Dropped:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>

  <!-- identity template -->
  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*" />
    </xsl:copy>
   </xsl:template>  

  <!-- empty template for all 'mystatus=Dropped` manga elements -->
  <xsl:template match="manga[my_status = 'Dropped']" />

</xsl:stylesheet>

You can apply that i.e. with Saxon on Windows and Linux. Or any other XSLT processor you have available.



标签: xml notepad++