Trying to import XML children from one file to ano

2019-05-21 06:54发布

问题:

I have looked into this post and found that it is almost exactly what I need to do. However, I am not able to produce the output expected given the suggestion in this post. Basically, I am trying to import </parameter> elements from an XML ($ManifestFile) file that contains something like:

<?xml version="1.0" encoding="utf-8"?>
<plasterManifest
  schemaVersion="1.1"
  templateType="Project" xmlns="http://www.microsoft.com/schemas/PowerShell/Plaster/v1">
  <metadata>
    <name>PlasterTestProject</name>
    <id>4c08dedb-7da7-4193-a2c0-eb665fe2b5e1</id>
    <version>0.0.1</version>
    <title>Testing creating custom Plaster Template for CI/CD</title>
    <description>Testing out creating a module project with Plaster for complete CI/CD files.</description>
    <author>Catherine Meyer</author>
    <tags></tags>
  </metadata>
  <parameters>
        <parameter name='AuthorName' type="user-fullname" prompt="Module author's name" />
        <parameter name='ModuleName' type="text" prompt="Name of your module" />
        <parameter name='ModuleDescription' type="text" prompt="Brief description on this module" />
        <parameter name='ModuleVersion' type="text" prompt="Initial module version" default='0.0.1' />
        <parameter name='GitLabUserName' type="text" prompt="Enter the GitLab Username to be used" default="${PLASTER_PARAM_FullName}"/>
        <parameter name="GitLubRepo" type="text" prompt="GitiLab repo name for this module" default="${PLASTER_PARAM_ModuleName}"/>
        <parameter name='ModuleFolders' type = 'multichoice' prompt='Please select folders to include' default='0,1'>
            <choice label='&amp;Public' value='Public' help='Folder containing public functions that can be used by the user.'/>
            <choice label='&amp;Private' value='Private' help='Folder containing internal functions that are not exposed to users'/>
        </parameter>
    </parameters>
</plasterManifest>

The document ($NewManifestFile) I'm trying to import into looks like:

<?xml version="1.0" encoding="utf-8"?>
<plasterManifest schemaVersion="1.1" templateType="Project" xmlns="http://www.microsoft.com/schemas/PowerShell/Plaster/v1">
  <metadata>
     <name>test3</name>
     <id>8c028f40-cdc6-40dc-8442-f5256a8c0ed9</id>
     <version>0.0.1</version>
     <title>test3</title>
     <description>SDSKL</description>
     <author>NAME</author>
    <tags> </tags>
  </metadata>
  <parameters>
  </parameters>
  <content>
  </content>
</plasterManifest>

The code I have written looks something like:

$ManifestFile = [xml](Get-Content ".\PlasterManifest.xml")
$NewManifestFile = [xml](Get-Content $PlasterMetadata.Path)
$NewManifestFile.plasterManifest.metadata.name

$Parameters = $ManifestFile.SelectSingleNode("//plasterManifest/parameters/parameter")
$Parameters
$NewParameters = $NewManifestFile.SelectSingleNode("//plasterManifest/parameters")
#Importing the parameters and content
foreach ($parameter in $Parameters) {
   $NewParamElem = $ManifestFile.ImportNode($parameter, $true)
   $NewParameters.AppendChild($NewParamElem)
}
[void]$NewManifestFile.save($PlasterMetadata.Path)

Now, it doesn't error out, but it also doesn't import at all. It seems as though some element is not being assigned properly somewhere. I have tried so many alternatives, and this seems to be the only one that is close to what I want. Any suggestions?

回答1:

As mklement0 pointed out, your XML documents have namespaces, so you need a namespace manager when selecting nodes with XPath expressions. Using dot-access for selecting the nodes gets you around namespace management, but since dot-access doesn't always work the way one might expect I'd still recommend sticking with SelectNodes() and using proper namespace managers.

$uri = 'http://www.microsoft.com/schemas/PowerShell/Plaster/v1'

[xml]$ManifestFile = Get-Content 'C:\path\to\old.xml'
$nm1 = New-Object Xml.XmlNamespaceManager $ManifestFile.NameTable
$nm1.AddNamespace('ns1', $uri)

[xml]$NewManifestFile = Get-Content 'C:\path\to\new.xml'
$nm2 = New-Object Xml.XmlNamespaceManager $NewManifestFile.NameTable
$nm2.AddNamespace('ns2', $uri)

$ManifestFile.SelectNodes('//ns1:parameter', $nm1) | ForEach-Object {
    $newnode = $NewManifestFile.ImportNode($_, $true)
    $parent  = $NewManifestFile.SelectSingleNode('//ns2:parameters', $nm2)
    $parent.AppendChild($newnode) | Out-Null
}

$NewManifestFile.Save('C:\path\to\new.xml')


回答2:

There are several problems with your current approach:

  • You're not importing the elements from the source document into the destination document, even though that is a prerequisite for inserting it into the destination document's DOM.

  • You're using .SelectSingleNode() to select the source-document nodes, even though - I presume - you meant to use .SelectNodes() to select all <parameter> elements.

  • You're missing namespace management for the documents, which is a prerequisite for successful XPath queries via .SelectSingleNode() / .SelectNodes().

    • However, given that namespace management is complex, the solution below employs workarounds. If you do want to deal with namespaces - which is the proper way to do it - see Ansgar Wiechers' helpful answer.

Here's a fixed, annotated solution:

$ManifestFile = [xml](Get-Content -Raw ./PlasterManifest.xml)
$NewManifestFile = [xml](Get-Content -Raw $PlasterMetadata.Path)

# Get the <parameters> element in the *source* doc.
# Note that PowerShell's dot notation-based access to the DOM does
# NOT require namespace management.
$ParametersRoot = $ManifestFile.plasterManifest.parameters

# Get the parent of the <parameter> elements in the *destination* doc.
# Note: Ideally we'd also use dot notation in order to avoid the need for namespace
#       management, but since the target <parameters> element is *empty*, 
#       PowerShell represents it as a *string* rather than as an XML element.
#       Instead, we use .GetElementsByTagName() to locate the element and rely
#       on the knowledge that there is only *one* in the whole document.
$NewParametersRoot = $NewManifestFile.GetElementsByTagName('parameters')[0]

# Import the source element's subtree into the destination document, so it can
# be inserted into the DOM later.
$ImportedParametersRoot = $NewManifestFile.ImportNode($ParametersRoot, $True)

# For simplicity, replace the entire <parameters> element, which
# obviates the need for a loop.
# Note the need to call .ReplaceChild() on the .documentElement property,
# not on the document object itself.
$null = $NewManifestFile.documentelement.ReplaceChild($ImportedParametersRoot, $NewParametersRoot)

# Save the modified destination document.
$NewManifestFile.Save($PlasterMetadata.Path)

Optional background information:

  • The .SelectSingleNode() / .SelectNodes(), because they accept XPath queries, are the most flexible and powerful methods for locating elements (nodes) of interest in an XML document, but they do require explicit namespace handling if the input document declares namespaces (such as xmlns="http://www.microsoft.com/schemas/PowerShell/Plaster/v1" in your case):

    • Note: If a given input document declares namespace and you neglect to handle them as described below, .SelectSingleNode() / .SelectNodes() simply return $null for all queries, if unqualified element names are used (e.g., parameters) and fails with namespace-qualified (namespace-prefixed) ones (e.g., plaster:parameters).

    • Namespace handling involves these steps (note that a given document may have multiple namespace declarations, but for simplicity the instructions assume only one):

      • Instantiate a namespace manager and associate it with the input document['s name table].

      • Associate the namespace's URI with a symbolic identifier. If the namespace declaration in the input document is for the default namespace - xmlns - you cannot use that as your symbolic identifier (the name xmlns is reserved) and must simply choose one.

      • Then, when you call .SelectSingleNode() / .SelectNodes(), you must use this symbolic identifier as an element-name prefix in your query strings; e.g., if your (self-chosen) symbolic identifer is plaster and you're looking for element parameters anywhere in the document, you'd use query string '//plaster:pararameters'

      • Ansgar Wiechers' helpful answer demonstrates all that.

  • By contrast, PowerShell's dot notation is always namespace-agnostic and the .GetElementByTagNames() method can be, so they require no explicit namespace handling.

    • Caveat: While this reduces complexity, you should only use it if you know that proper namespace handling is not a necessity for processing the input document correctly.

    • PowerShell's dot notation:

      • PowerShell conveniently maps the XML document's DOM - the hierarchy of nodes in the input document - onto a nested object with properties, allowing you to drill down into the document with regular dot notation; e.g., the equivalent of XPath query '/root/elem' would be $xmlDoc.root.elem
        However, this implies that you can only use this notation to access elements whose path in the hierarchy you already know - queries are not supported (though an XPath-enabled Select-Xml cmdlet exits).

      • This mapping ignores namespace qualifiers (prefixes), so you must use the mere element name, without any namespace prefix; e.g., if the input document has a plaster:parameters element, you must refer to it as just parameters.

      • As convenient as dot notation is, it comes with pitfalls, the most notable of which is that quasi-leaf elements - those that either have no child nodes at all or only non-element child nodes such as a text node - are returned as strings, not elements, which makes it difficult to modify them.
        In short: the mapping between the XML DOM and PowerShell's object model isn't - and cannot be - exact.

    • .GetElementsByTagName() method:

      • Returns a collection of all elements with the specified tag name, in the entire document, across all levels of the hierarchy (even when invoked from an interior node).
        As such, it doesn't allow for sophisticated selection of target elements, and the documentation recommends using .SelectSingleNode() / .SelectNodes() instead.

      • While you can pass a namespace URI as the second argument, it isn't required; if you don't, you must specify the element (tag) name literally, exactly as it occurs in the document, including its namespace qualifier, if present.