<manga>
<manga_mangadb_id>36037</manga_mangadb_id>
<manga_title><![CDATA["Bungaku Shoujo" to Ue Kawaku Ghost]]></manga_title>
<manga_volumes>4</manga_volumes>
<manga_chapters>30</manga_chapters>
<my_status>Dropped</my_status>
<my_comments><![CDATA[]]></my_comments>
<my_tags><![CDATA[Drama, Romance, Shounen, Psychological]]></my_tags>
</manga>
My .XML file contains 14000 lines and the value <my_status>Dropped</my_status>
appears 125 times. I want to delete the root node and everything in it if it contains <my_status>Dropped</my_status>
. Is there a way to batch remove it or is doing it by hand the only way?
Consider running XSLT, the special-purpose language designed to transform XML files such as removing nodes based on certain conditions. Specifically, run the identity transform to copy document as is and an empty template to remove needed element conditionally.
You can run XSLT 1.0 scripts in almost any general-purpose language such as C#, Java, Python, PHP, even VBA much like the other special-purpose language (SQL). Additionally, dedicated, standalone tools are available to even run XSLT 2.0 and 3.0. See tag page here.
XSLT (save as .xsl file, a special .xml file)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:strip-space elements="*"/>
<!-- Identity Transform -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<!-- Empty Template to Remove Elements -->
<xsl:template match="manga[my_status='Dropped']"/>
</xsl:stylesheet>
Below are command line tools available to run XSLT depending on OS.
Unix (Mac/Linux) using xsltproc, will output new transformed xml
xsltproc -o /path/to/output.xml /path/to/XSLTScript.xsl /path/to/input.xml
Windows using PowerShell script calling NET's System.Xml.Xsl.XslCompiledTransform class
Save below as a .ps1 script
param ($xml, $xsl, $output)
if (-not $xml -or -not $xsl -or -not $output) {
Write-Host "& .\xslt.ps1 [-xml] xml-input [-xsl] xsl-input [-output] transform-output"
exit;
}
trap [Exception]{
Write-Host $_.Exception;
}
$xslt = New-Object System.Xml.Xsl.XslCompiledTransform;
$xslt.Load($xsl);
$xslt.Transform($xml, $output);
Write-Host "generated" $output;
Read-Host -Prompt "Press Enter to exit";
Command line call (will output new, transformed XML file)
Powershell.exe -File "C:\Path\To\PowerShell\Script.ps1"^
"C:\Path\To\Input.xml" "C:\Path\To\XSLTScript.xsl" "C:\Path\To\Ouput.xml"
You can achieve that with an XSLT-processor (version 1.0 and up) using an empty template in combination with an identity template. Use the following XSLT template to remove all <manga>
elements which have a <my_status>
element with the value Dropped
:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
<!-- identity template -->
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*" />
</xsl:copy>
</xsl:template>
<!-- empty template for all 'mystatus=Dropped` manga elements -->
<xsl:template match="manga[my_status = 'Dropped']" />
</xsl:stylesheet>
You can apply that i.e. with Saxon on Windows and Linux. Or any other XSLT processor you have available.