How do I compare certain values from different XML

2019-06-10 01:54发布

站内文章 / 前沿技术

28 0

老娘就宠你

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I want to write code in Perl that compares two XML files.

A Little bit from the history... With API Documentation (get request) I get data1 form the Web Service1 and data2 from the Service2. They are presented in XML Format, but not the same.

I should compare just two elements in these files (deviceName and ipAddress), if they are the same in both files, It should be a message " WebService1 already contains DeviceName "Switch1" ". If not - I would make POST request and add this device in WebService1/WebService2.

Can you give me advice, what Modules should I use and how should I begin with this comparing?

For example (file1)

   <?xml version="1.0" ?>
   <queryResponse last="34" first="0" count="35" type="Devices" responseType="listEntityInstances" requestUrl="https://hostname/webacs/api/v1/data/Devices?.full=true" rootUrl="https://hostname/webacs/api/v1/data">
      <entity dtoType="devicesDTO" type="Devices" url="https://hostname/webacs/api/v1/data/Devices/201">
         <devicesDTO displayName="201201" id="201">
           <clearedAlarms>0</clearedAlarms>
           <collectionDetail></collectionDetail>
           <collectionTime></collectionTime>
           <creationTime></creationTime>
           <criticalAlarms>0</criticalAlarms>
           <deviceId>205571</deviceId>
           <deviceName>NEW-SW5</deviceName>
           <deviceType>Cisco Switch</deviceType>
           <informationAlarms>0</informationAlarms>
           <ipAddress>10.66.12.128</ipAddress>
         <location></location>
           <majorAlarms>0</majorAlarms>
           <managementStatus></managementStatus>
              <manufacturerPartNrs>
                  <manufacturerPartNr></manufacturerPartNr>
              </manufacturerPartNrs>
              <minorAlarms>0</minorAlarms>
              <productFamily></productFamily>
              <reachability>Reachable</reachability>
              <softwareType>IOS</softwareType>
              <softwareVersion>12.1(22)</softwareVersion>
              <warningAlarms>0</warningAlarms>
         </devicesDTO>
      </entity>
   </queryResponse>

File2

  <?xml version="1.0" encoding="utf-8" standalone="yes"?>
  <ns3:networkdevice name="NEW-SW5" id="9a6ef750-2620-11e4-81be-b83861d71f95" xmlns:ns2="ers.ise.cisco.com" xmlns:ns3="network.ers.ise.cisco.com">
  <link type="application/xml" href="https://hostname:9060/ers/config/networkdevice/123456" rel="self"/>
       <authenticationSettings>
          <enableKeyWrap>false</enableKeyWrap>
          <keyInputFormat>ASCII</keyInputFormat>
          <networkProtocol>RADIUS</networkProtocol>
          <radiusSharedSecret>******</radiusSharedSecret>
       </authenticationSettings>
       <NetworkDeviceIPList>
         <NetworkDeviceIP>
            <ipaddress>10.66.12.128</ipaddress>
            <mask>21</mask>
         </NetworkDeviceIP>
       </NetworkDeviceIPList>
       <NetworkDeviceGroupList>
         <NetworkDeviceGroup>Location#All Locations</NetworkDeviceGroup>
         <NetworkDeviceGroup>Device Type#All Device Types</NetworkDeviceGroup>
   </NetworkDeviceGroupList>
  </ns3:networkdevice>

There is smth special: In file1 my tags called: deviceName, ipAddress and they are elements.
In file2 we have one attribute (because it is staying in the main element ns3:networkdevice and it's called name what responds our deviceName from file1 ) and the other element is called ipaddress (ipAddress in file1)

回答1:

You can use XML::Twig to parse both responses. Each of them needs an individual parser.

For the first one, you need to go for the two tags <deviceName> and <ipAddress>. A simple twig_handler for each of them that access the text property of the matched element is sufficient.

Those handlers can be complex, but in our case a code reference that deals with a single value is enough. We know that there is only one occurrence of each value, so we can directly assign both of them to their respective lexical variables.

use strict;
use warnings;
use XML::Twig;

my ($device_name, $ip_address);
XML::Twig->new(
    twig_handlers => {
        deviceName => sub { $device_name = $_->text },
        ipAddress => sub { $ip_address = $_->text },
    }
)->parse(\*DATA);

say $device_name;
say $ip_address;

__DATA__
<?xml version="1.0" ?>
<queryResponse last="34" first="0" count="35" type="Devices" responseType="listEntityInstances" requestUrl="https://hostname/webacs/api/v1/data/Devices?.full=true" rootUrl="https://hostname/webacs/api/v1/data">
   <entity dtoType="devicesDTO" type="Devices" url="https://hostname/webacs/api/v1/data/Devices/201">
      <devicesDTO displayName="201201" id="201">
        <clearedAlarms>0</clearedAlarms>
        <collectionDetail></collectionDetail>
        <collectionTime></collectionTime>
        <creationTime></creationTime>
        <criticalAlarms>0</criticalAlarms>
        <deviceId>205571</deviceId>
        <deviceName>NEW-SW5</deviceName>
        <deviceType>Cisco Switch</deviceType>
        <informationAlarms>0</informationAlarms>
        <ipAddress>10.66.12.128</ipAddress>
      <location></location>
        <majorAlarms>0</majorAlarms>
        <managementStatus></managementStatus>
           <manufacturerPartNrs>
               <manufacturerPartNr></manufacturerPartNr>
           </manufacturerPartNrs>
           <minorAlarms>0</minorAlarms>
           <productFamily></productFamily>
           <reachability>Reachable</reachability>
           <softwareType>IOS</softwareType>
           <softwareVersion>12.1(22)</softwareVersion>
           <warningAlarms>0</warningAlarms>
      </devicesDTO>
   </entity>
</queryResponse>

For the second one you need to use att() to get the name attribute of one of the elements, but that's also straight-forward.

use strict;
use warnings;
use XML::Twig;

my ($device_name, $ip_address);
XML::Twig->new(
    twig_handlers => {
        'ns3:networkdevice' => sub { $device_name = $_->att('name') },
        ipaddress => sub { $ip_address = $_->text },
    }
)->parse(\*DATA);

say $device_name;
say $ip_address;
__DATA__
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<ns3:networkdevice name="NEW-SW5" id="9a6ef750-2620-11e4-81be-b83861d71f95" xmlns:ns2="ers.ise.cisco.com" xmlns:ns3="network.ers.ise.cisco.com">
<link type="application/xml" href="https://hostname:9060/ers/config/networkdevice/123456" rel="self"/>
     <authenticationSettings>
        <enableKeyWrap>false</enableKeyWrap>
        <keyInputFormat>ASCII</keyInputFormat>
        <networkProtocol>RADIUS</networkProtocol>
        <radiusSharedSecret>******</radiusSharedSecret>
     </authenticationSettings>
     <NetworkDeviceIPList>
       <NetworkDeviceIP>
          <ipaddress>10.66.12.128</ipaddress>
          <mask>21</mask>
       </NetworkDeviceIP>
     </NetworkDeviceIPList>
     <NetworkDeviceGroupList>
       <NetworkDeviceGroup>Location#All Locations</NetworkDeviceGroup>
       <NetworkDeviceGroup>Device Type#All Device Types</NetworkDeviceGroup>
 </NetworkDeviceGroupList>
</ns3:networkdevice>

Now you that you have both of these, you can combine that. I suggest to create a function for each of them, pass in the response XML and make them return the $device_name and $ip_address.

use strict;
use warnings;
use XML::Twig;

sub parse_response_1 {
    my $xml = shift;

    my ( $device_name, $ip_address );
    XML::Twig->new(
        twig_handlers => {
            deviceName => sub { $device_name = $_->text },
            ipAddress  => sub { $ip_address  = $_->text },
        }
    )->parse($xml);

    return $device_name, $ip_address;
}

sub parse_response_2 {
    my $xml = shift;

    my ( $device_name, $ip_address );
    XML::Twig->new(
        twig_handlers => {
            'ns3:networkdevice' => sub { $device_name = $_->att('name') },
            ipaddress           => sub { $ip_address  = $_->text },
        }
    )->parse($xml);

    return $device_name, $ip_address;
}

Of course my names parse_response_1 and parse_response_2 are not the best choice. Don't use the numbers, use the names of the services that returned the responses instead.

With those two functions we now have the means to retrieve exactly the information that we want. All that's left is to check them.

sub check {
    my ( $response_1, $response_2 ) = @_;

    my ( $device_name_1, $ip_address_1 ) = parse_response_1($response_1);
    my ( $device_name_2, $ip_address_2 ) = parse_response_2($response_2);

    return $device_name_1 eq $device_name_2 && $ip_address_1 eq $ip_address_2;
}

Again, the names of the variables could be better. Now you just need to call that with your two response XMLs and it will return a truthy value, or not.

回答2:

Much like simbaque I'd use XML::Twig, although I'd tackle it slightly differently - I'm offer this up for the sake of comparison - rather than using twig_handlers - which I'd call a powerful and useful technique, but particularly suitable for incremental parsing larger XML - something that uses get_xpath to look for xpath based references within the XML might provide an alternative.

#!/usr/bin/env perl
use strict;
use warnings;

use XML::Twig;

my $xml1 = XML::Twig->new->parsefile('test1a.xml');
my $xml2 = XML::Twig->new->parsefile('test1b.xml');

if ( $xml1->get_xpath( '//deviceName', 0 )->text 
  eq $xml2->root->att('name') )
{
   print "Name matches\n";
}

if ( $xml1->get_xpath( '//ipAddress', 0 )->text 
  eq $xml2->get_xpath( '//ipaddress', 0 )->text )
{
   print "IP matches\n";
}

We parse both files into an XML::Twig object, and then use get_xpath to look up the node location. // means anywhere in tree, and the 0 refers to which instance (e.g. the first, only).

Ideally we might do some xpath strings to compare directly though - we can't here, because the 'name' attribute is an attribute of the root node (and one of the limitations of the XML::Twig xpath engine is you can't directly select attribute content).

But with XML::LibXML - which is more fully featured, at a cost of a somewhat steeper learning curve. I wouldn't use it generally but in this specific case it can handle the xpath expression to select an attribute of the root node.

So that would be something like:

#!/usr/bin/env perl
use strict;
use warnings;

use XML::LibXML;

my %compare = (
   '//deviceName' => '//@name',
   '//ipAddress'  => '//ipaddress'
);

my $search1 = XML::LibXML::XPathContext->new(
                 XML::LibXML->load_xml( location => 'test1a.xml' ) );
my $search2 = XML::LibXML::XPathContext->new(
                 XML::LibXML->load_xml( location => 'test1b.xml' ) );

foreach my $key ( keys %compare ) {
   my $first  = $search1->find($key);
   my $second = $search2->find( $compare{$key} );

   print "$key = $first\n";
   print "$compare{$key} = $second\n";
   print "Matches found\n" if $first eq $second;
}

回答3:

This isn't a simple task to write from scratch. You should make use of XML::Compare

回答4:

use XML::Simple;
use Data::Dumper;

my $file1_ref = XMLin("./file1");
my $file2_ref = XMLin("./file2");

if($file2_ref->{NetworkDeviceIPList}->{NetworkDeviceIP}->{ipaddress} eq $file1_ref->{entity}->{devicesDTO}->{ipAddress} && $file2_ref->{name} eq $file1_ref->{entity}->{devicesDTO}->{deviceName}) {
  print "WebService1 already contains DeviceName \"".$file2_ref->{name}."\"\n";
} else {
  # POST request and add this device in WebService1/WebService2
  # Code here ....                                                                                                                                                                                                                                                              
}

You can turn the calls into methods and I would strongly suggest that you add and eval around the conversion and check for errors just in case the returned XML is buggy

回答5:

First note that there is no universal agreement on what it means for two XML files to be "the same". For example, everyone agrees that whitespace within start and end tags should be ignored, and that the distinction between single and double quotes around attributes is irrelevant, and that attributes can be in any order; but requirements vary on how to handle comments, whitespace between element tags, namespace prefixes, and numerous other details.

Another area where requirements vary is what information you want when documents are deemed different. Some mechanisms will only give you a yes-or-no answer, and won't help you find the differences.

This has the consequence that there may be general-purpose solutions out there, but they might not always meet your specific requirements.

So writing your own comparator isn't a ridiculous idea if you're prepared to write a few hundred lines of code.

But two off-the-shelf solutions you could consider, if you can find examples that run in the Perl environment, are:

XML canonicalizers: canonicalize both documents and then compare the results at the binary level.
XPath 2.0: offers the function deep-equal() to compare two nodes (including document nodes)