Updating xml attribute value based on other with P

2019-01-20 04:33发布

问题:

This is my sample xml file

<manifest>
 <default>
    <remote>remote1</remote>
    <revision>rev1</revision>
 </default>
 <project>
    <name>common</name>
    <path>opensource/device</path>
    <revision>sa</revision>
    <x-ship>oss</x-ship>
 </project>
 <project>
   <name>external</name>
   <path>source/tp</path>
   <x-ship>none</x-ship>
 </project>
 <project>
   <name>ws</name>
   <path>opensource/ws</path>
   <remote>nj</remote>
   <revision>myno</revision>
   <x-ship>none</x-ship>
 </project>
</manifest>

In this I need to update the value of revision only when <path> has "opensource" string in it.

I searched a lot but couldn't find anything useful to achieve this, I could modify the value based on the position as below, can anyone help me updating this? Or let me know if there is a better Perl library to do this.

#!/usr/bin/perl

use strict;
use warnings;

use XML::Simple;

my $xml_file = 'dev.xml';

my $xml = XMLin(
    $xml_file,
    KeepRoot => 1,
    ForceArray => 1,
);

$xml->{manifest}->[0]->{project}->[2]->{revision} = 'kyo';

XMLout(
    $xml,
    KeepRoot => 1,
    NoAttr => 1,
    OutputFile => $xml_file,
);

回答1:

There is definitely a learning curve, but XML::Twig and XPath syntax can handle this quite well. The following demonstrates that for a variation of the fake data that you provided.

Please note that one of the big features to twig is the ability to parse data as you go instead of having to load an extremely large XML file entirely into memory. That may not be a limitation in your case, but is an important feature for some.

use strict;
use warnings;

use XML::Twig;

my $data = do { local $/; <DATA> };

my $t= XML::Twig->new( 
    twig_handlers => {
        q{project[string(path) =~ /\bopensource\b/]/revision} => \&revision,
    },
    pretty_print => 'indented',
);
$t->parse( $data );
$t->print;

sub revision {
    my ($twig, $rev) = @_;
    $rev->set_text("open source - " . $rev->text());
}

__DATA__
<manifest>
 <default>
    <remote>remote1</remote>
    <revision>rev1</revision>
 </default>
 <project>
    <name>common</name>
    <path>NOTopensource/device</path>
    <revision>sa</revision>
    <x-ship>oss</x-ship>
 </project>
 <project>
   <name>external</name>
   <path>source/tp</path>
   <x-ship>none</x-ship>
 </project>
 <project>
   <name>ws</name>
   <path>opensource/ws</path>
   <remote>nj</remote>
   <revision>myno</revision>
   <x-ship>none</x-ship>
 </project>
</manifest>

Output:

You'll notice that the last revision has open source - prefixed to it.

<manifest>
  <default>
    <remote>remote1</remote>
    <revision>rev1</revision>
  </default>
  <project>
    <name>common</name>
    <path>NOTopensource/device</path>
    <revision>sa</revision>
    <x-ship>oss</x-ship>
  </project>
  <project>
    <name>external</name>
    <path>source/tp</path>
    <x-ship>none</x-ship>
  </project>
  <project>
    <name>ws</name>
    <path>opensource/ws</path>
    <remote>nj</remote>
    <revision>open source - myno</revision>
    <x-ship>none</x-ship>
  </project>
</manifest>

Addendum about sibling elements:

Yes, there are methods to traverse within a twig to nearby xml elements. For example, if I wanted to pull the name of the revision I was editting, and place that in the new text, I could do the following:

sub revision {
    my ($twig, $rev) = @_;
    my $name = $rev->parent()->first_child("name");
    $rev->set_text("open source - " $name->text() . ' - '. $rev->text());
}

Note the ws now added to the edited revision tag:

  <project>
    <name>ws</name>
    <path>opensource/ws</path>
    <remote>nj</remote>
    <revision>open source - ws - myno</revision>
    <x-ship>none</x-ship>
  </project>

This method of traversing the twig to nearby elements can often be a useful way of filtering. I easily could've done the same to enforce this branch be one with a path that contained opensource, but setting that requirement in the xpath for the handler is convenient if one is familiar with the xpath syntax.

Also note, in my above example, I assume there is a sibling of type name. Normally I would check to make sure before calling ->text() or one might get an error.

Addendum about Attributes:

Concerning your edge case with an alternative format:

<project path="opensource" revision="apple" name="platform" x-ship="none"/>

The above contains the same data as other projects, but instead of the values being child elements, they are attributes. This is also a feature of XML, but it is different and so must be handled in a different way.

The following is an edit of the originally suggested script that adds a new handler for projects that contain a path attribute versus a child:

use strict;
use warnings;

use XML::Twig;

my $data = do { local $/; <DATA> };

my $t= XML::Twig->new( 
    twig_handlers => {
        q{project[string(path) =~ /\bopensource\b/]/revision} => \&revision,
        q{project[@path =~ /\bopensource\b/]} => \&project,
    },
    pretty_print => 'indented',
);
$t->parse( $data );
$t->print;

sub revision {
    my ($twig, $rev) = @_;
    $rev->set_text("open source - " . $rev->text());
}

sub project {
    my ($twig, $project) = @_;

    $project->set_att(
        revision => 'open source - ' . $project->{att}{revision},
    );
}

__DATA__
<manifest>
 <default>
    <remote>remote1</remote>
    <revision>rev1</revision>
 </default>
 <project>
    <name>common</name>
    <path>NOTopensource/device</path>
    <revision>sa</revision>
    <x-ship>oss</x-ship>
 </project>
 <project path="opensource" revision="apple" name="platform" x-ship="none"/>
 <project>
   <name>external</name>
   <path>source/tp</path>
   <x-ship>none</x-ship>
 </project>
 <project>
   <name>ws</name>
   <path>opensource/ws</path>
   <remote>nj</remote>
   <revision>myno</revision>
   <x-ship>none</x-ship>
 </project>
</manifest>

And just to give you something to compare and learn from, here is the same code but with the filtering done in the handler versus using xpath:

    twig_handlers => {
        q{project[string(path) =~ /\bopensource\b/]/revision} => \&revision,
        q{project} => \&project,
    },

...

sub project {
    my ($twig, $project) = @_;

    if ($project->{att}{path} && $project->{att}{path} =~ /\bopensource\b/) {
        $project->set_att(
            revision => 'open source - ' . $project->{att}{revision},
        );
    }
}


回答2:

As a learning exercise, I decided to duplicate the above solution using XML::LibXML. I also used this perlmonks post as a resource to get started as the modules docs are difficult to grok: Stepping up from XML::Simple to XML::LibXML.

use strict;
use warnings;

use XML::LibXML;

my $data = do { local $/; <DATA> };

my $dom = XML::LibXML->load_xml(string => $data);

for my $project ($dom->findnodes('//project')) {
    if (my ($path) = $project->findnodes("./path")) {
        next if $path->textContent() !~ /\bopensource\b/;
        my ($revision) = $project->findnodes("./revision")
            or next;

        my $oldval = $revision->textContent();
        $revision->removeChildNodes();
        $revision->appendText('open source - ' . $oldval);

    } elsif ( my $pathatt = $project->getAttribute('path') ) {
        next if $pathatt !~ /\bopensource\b/;
        $project->setAttribute('revision', 'open source - ' . $project->getAttribute('revision'))
    }
}

print $dom->documentElement()->toString();

__DATA__
<manifest>
 <default>
    <remote>remote1</remote>
    <revision>rev1</revision>
 </default>
 <project>
    <name>common</name>
    <path>NOTopensource/device</path>
    <revision>sa</revision>
    <x-ship>oss</x-ship>
 </project>
 <project path="opensource" revision="apple" name="platform" x-ship="none"/>
 <project>
   <name>external</name>
   <path>source/tp</path>
   <x-ship>none</x-ship>
 </project>
 <project>
   <name>ws</name>
   <path>opensource/ws</path>
   <remote>nj</remote>
   <revision>myno</revision>
   <x-ship>none</x-ship>
 </project>
</manifest>

Result:

<manifest>
 <default>
    <remote>remote1</remote>
    <revision>rev1</revision>
 </default>
 <project>
    <name>common</name>
    <path>NOTopensource/device</path>
    <revision>sa</revision>
    <x-ship>oss</x-ship>
 </project>
 <project path="opensource" revision="open source - apple" name="platform" x-ship="none"/>
 <project>
   <name>external</name>
   <path>source/tp</path>
   <x-ship>none</x-ship>
 </project>
 <project>
   <name>ws</name>
   <path>opensource/ws</path>
   <remote>nj</remote>
   <revision>open source - myno</revision>
   <x-ship>none</x-ship>
 </project>
</manifest>