How to filter out nodes from an XML using PERL scr

2019-08-06 12:41发布

This question is all over the internet but all the examples I see do not take into account my apparently unique circumstances. Here is an excerpt from my XML:

<message type="error" from="Realtime" timestamp="Mon Nov 24 19:28:55 2014"> Could not receive from Loader </message>
<message type="warning" from="Dcd_Mux" timestamp="Mon Dec  1 02:31:18 2014"> Could not connect to Dcd </message>

Instead of having several levels of nodes, I just have several attributes on a message node. I want to be able to filter out nodes based on an argument to my Perl script. For example: If I wanted to filter out all messages with type="error", and I was using an XML that only had the 2 lines from above, my output would only be the warning message from above. Output shown here:

<message type="warning" from="Dcd_Mux" timestamp="Mon Dec  1 02:31:18 2014"> Could not connect to Dcd </message>

I need some direction on how to begin opening the XML, looping through the entire thing, and removing any nodes that have attributes that match my filter. I'm interested in using LibXML to get this done.

4条回答
Luminary・发光体
2楼-- · 2019-08-06 13:01

This solution is a variation on the one from Hunter McMillen, and is here largely to illustrate what I meant by "looks like a Java program written in Perl".

The parameter validation is part of it and, while I have reduced it to a simple count check, I would not normally write anything at all. It is of doubtful worth as the question is about how to process the data, and any such trimmings depend on who will be using the program and how often.

I have chosen to serialize the output and print it to STDOUT, as it is often more useful to be able to redirect output as required on the command line.

I recognized what I thought was a Java-style approach by the attention to verification and general "protecting me from myself". I don't believe that adding a label and using it in next is at all helpful, especially with such a short loop.

use strict;
use warnings; 

use XML::LibXML::PrettyPrint;

@ARGV == 2 or die <<END_USAGE;
Usage:
  $0 <XML file> <node type>
END_USAGE

my ($xml_file, $exclude_type) = @ARGV;

my $dom = XML::LibXML->load_xml(location => $xml_file);

for my $node ( $dom->findnodes('/root/message[@type]') ) {
  my $type = $node->getAttribute('type');
  $node->unbindNode if $type eq $exclude_type;
}

local $XML::LibXML::skipXMLDeclaration = 1;
my $pp = XML::LibXML::PrettyPrint->new;
print $pp->pretty_print($dom)->toString;

output

<root>
  <message type="warning" from="Dcd_Mux" timestamp="Mon Dec  1 02:31:18 2014">
    Could not connect to Dcd
  </message>
</root>
查看更多
Rolldiameter
3楼-- · 2019-08-06 13:03

It could look something like this using XML::LibXML:

use strict;
use warnings; 

use XML::LibXML;

my $filename = $ARGV[0] 
   or die "Missing XML filename to parse";
my $type = $ARGV[1] 
   or die "Missing type of node to exclude";

open(my $xml_file, '<', $filename) 
   or die "Cannot open XML file '$filename' for reading: $!";

my $dom = XML::LibXML->load_xml(IO => $xml_file);
NODE:
foreach my $message_node ( $dom->findnodes('/root/message') ) {
   next NODE 
      unless $message_node->hasAttribute('type');

   $message_node->unbindNode() 
      if $message_node->getAttribute('type') eq $type;
}
$dom->toFile($filename);
查看更多
我命由我不由天
4楼-- · 2019-08-06 13:05

I use XML::LibXML as my XML parser.

use XML::LibXML qw( );

die "usage\n" if @ARGV != 2;

my ($type, $qfn) = @ARGV;
my $doc = XML::LibXML->new->parse_file($qfn);
for my $node ($doc->findnodes('//message') {
   my $type_addr = $node->getAttribute('type');
   next if !$type_addr || $type_addr ne $type;

   $node->parentNode->removeChild($node);
}

$doc->toFile($qfn);
查看更多
We Are One
5楼-- · 2019-08-06 13:11

There's two elements to your problem - first building a filter criteria, and the selecting or deleting elements based on it.

In particular - mixing 'add' and 'remove' can be quite difficult, because deciding what to do if they don't apply or contradict can be rather annoying.

Anyway, I'm offering XML::Twig despite that not being precisely what you've asked for - because I've used it a fair bit, and haven't really touched LibXML.

#!/usr/bin/perl
use strict;
use warnings;

use XML::Twig;

#read these from ARGV, just here as example.
my @sample_filters = qw ( -type=error
                          -from=Not_Dcd_Mux );

my %exclude;
for (@sample_filters) {
    if (m/^-/) {
        my ( $att, $criteria ) = (
            m/^-     #starts with -
              (\w+)  #word
              =     
              (\w+)
              $      #end of string
              /x
        );
        next unless $att;
        $exclude{$att} = $criteria;
    }
}

#process_message is called for each 'message' element, and tests filters for exclusion.
sub process_message {
    my ( $twig, $message ) = @_;
    foreach my $att ( keys %exclude ) {
        if ( $message->att($att) eq $exclude{$att} ) {
            $message->delete();
            last;
        }
    }
}

my $twig = XML::Twig->new(
    pretty_print  => 'indented',
    twig_handlers => { 'message' => \&process_message }
);
$twig->parse( \*DATA ); #might use 'parsefile ( $filename )' or 'STDIN' instead
$twig->print;


__DATA__
<XML>
<message type="error" from="Realtime" timestamp="Mon Nov 24 19:28:55 2014"> Could not receive from Loader </message>
<message type="warning" from="Not_Dcd_Mux" timestamp="Mon Dec  1 02:31:18 2014"> Could not connect to Dcd </message>
<message type="warning" from="Dcd_Mux" timestamp="Mon Dec  1 02:31:18 2014"> Could not connect to Dcd </message>
</XML>
查看更多
登录 后发表回答