XSLT to augment Hadoop config

2020-05-06 10:45发布

问题:

What is an XSLT (version 1.0) transform that can add or replace property values based on name?

For example, given the following input XML

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>

    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/hadoop/dfs/name</value>
    </property>
</configuration>

How would I specify two properties with names and values, for example:

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>2</value>
    </property>

    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/hadoop/dfs/data</value>
    </property>
</configuration>

So the resulting XML contains all original children of the root configuration element, and only one property with a given name? For example:

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>2</value>
    </property>

    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/hadoop/dfs/name</value>
    </property>

    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/hadoop/dfs/data</value>
    </property>
</configuration>

I've tried exampled from several other questions, but they don't have the same schema, and I don't know enough XSLT to adjust to my use case.

回答1:

Given:

Input XML

<configuration>
    <property>
        <name>A</name>
        <value>old A</value>
    </property>
    <property>
        <name>B</name>
        <value>old B</value>
    </property>
</configuration>

override.xml

<configuration>
    <property>
        <name>B</name>
        <value>new B</value>
    </property>
    <property>
        <name>C</name>
        <value>new C</value>
    </property>
</configuration>

the following stylesheet:

XSLT 1.0

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<xsl:param name="override-path" select="'override.xml'" />
<xsl:variable name="override-properties" select="document($override-path)/configuration/property" />

<xsl:template match="/configuration">
    <xsl:copy>
        <!-- copy local properties not overridden by external properties -->
        <xsl:copy-of select="property[not(name=$override-properties/name)]"/>
        <!-- add all overiding properties -->
        <xsl:copy-of select="$override-properties"/>
    </xsl:copy>
</xsl:template>

</xsl:stylesheet>

will return:

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
   <property>
      <name>A</name>
      <value>old A</value>
   </property>
   <property>
      <name>B</name>
      <value>new B</value>
   </property>
   <property>
      <name>C</name>
      <value>new C</value>
   </property>
</configuration>


回答2:

This is a non-answer for the question, but I am posting here as a resolution for the community to the problem I was trying to solve.

I was attempting to create minimally dependent shell scripts that could update a config from a Vagrantfile.

This answer does not address the original question, so I won't accept it, but it is what I wound up going with in the meantime.

Under a provision directory, I created the following structure:

provision/
 |- hadoop/
 |   |- etc/
 |       |- hadoop/
 |           |- core-site.xml
 |           |- hdfs-site.xml
 |- lib/
 |   |- Provision/
 |       |- Hadoop/
 |           |- Override.pm
 |- hadoop-config.pl

Where core-site.xml looks like:

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

hdfs-site.xml looks similar (bare formatting),

And Override.pm is the following:

use strict;
use XML::LibXML;
package Provision::Hadoop::Override;

sub override_config
{
    my ($xml, $override) = @_;

    foreach my $property ($override->findnodes("/configuration/property"))
    {
        my $name = $property->find("name")->shift()->textContent;
        my $value = $property->find("value")->shift()->textContent;

        if ( my($node) = $xml->findnodes("/configuration/property[name='$name']") )
        {
            if ( my($vnode) = $xml->findnodes("/configuration/property[name='$name']/value") )
            {
                $vnode->removeChildNodes();
                $vnode->appendText($value);
            }
        }
        else
        {
            my $config = $xml->find("/configuration")->shift();
            my $prop = $config->addNewChild(undef, "property");
            $prop->appendText("\n\t");
            $prop->addNewChild(undef, "name")->appendText($name);
            $prop->appendText("\n\t");
            $prop->addNewChild(undef, "value")->appendText($value);
            $prop->appendText("\n");
            $config->addChild($prop);
            $config->appendText("\n");
        }
    }
    $xml;
}

1;

With hadoop-config.pl left as such:

#!/usr/bin/perl --
use lib "/vagrant/provision/lib";
use Provision::Hadoop::Override;
use File::Find;
use XML::LibXML;

sub process_file {
    if (-f $_)
    {
        my $dirname = "/vagrant/provision/hadoop";
        my $hadoop_prefix = $ENV{HADOOP_PREFIX};
        my $config = $File::Find::name;
        my $override = XML::LibXML->load_xml(location => $config);
        print "Loading values from $config";
        $config =~ s/$dirname/$hadoop_prefix/;
        print " into $config...";
        my $xml = XML::LibXML->load_xml(location => $config);

        Provision::Hadoop::Override::override_config($xml, $override);

        $xml->toFile($config);
        print " OK.\n";
    }
}
find(\&process_file, ("/vagrant/provision/hadoop"));


标签: xslt