Extract data from xml file using shell commands

2019-08-18 03:18发布

问题:

I have an xml with below content and my question is how to extract Username, Password values from resource tag, here we need to exclude commented resource tag and fetch values from uncommented resource tag by using shell script. I tried but it was fetching values from latest tag. Can someone help me how to remove comments tags and fetch values from xml.

<?xml version='1.0' encoding='utf-8'?>
<!-- The contents of this file will be loaded for each web application -->
<!--
 <Resource name="jdbcSource" auth="Container"
type="javax.sql.DataSource"
 username="demo"
    password="test"
        driverClassName="driverclassname"
        url="driver@host"
    maxActive="20"
    maxIdle="10"
     />

-->

<Resource auth="Container"
driverClassName="driverclassname" maxActive="100" maxIdle="30" maxWait="10000"
name="jdbcSource" password="test" type="javax.sql.DataSource"
url="driver@host"
username="demo"/>

</Context>

回答1:

Firstly my answer assumes that you have actual well formed source XML. The example code you've provided isn't XML as it doesn't have an opening root element, namely <Context> - but I'll assume there is one anyway.


Bash features by themselves are not very well suited parsing XML.

This Bash FAQ states the following:

Do not attempt [to extract data from an XML file] with sed, awk, grep, and so on (it leads to undesired results)

If you must use a shell script then utilize an XML specific command line tool, such as XMLStarlet (there are other similar tools available). See download info here - if you don't already have XML Starlet installed.

Solution:

Using XML Starlet you can run the following commands:

uname=$(xml sel -t -v "/Context/Resource/@username" path/to/file.xml)
pword=$(xml sel -t -v "/Context/Resource/@password" path/to/file.xml)

echo "$uname $pword" # --> demo test

Explanation

  • uname=$(...)

    Here we utilize Command substitution to assign the output of the XML Startlet command to a variable named uname (i.e. the username).

  • xml sel -t -v "/Context/Resource/@username"

    This command breaks down as follows:

    • xml - invoke the XML Starlet command.
    • sel - select data or query XML document(s).
    • -t - the template option.
    • -v - print the value of XPATH expression.
    • "/Context/Resource/@username" - the xpath expression to select the value of the username attribute of the Resource tag/element.
  • path/to/file.xml

    This part should be replaced with the real path to your .xml file.

Likewise, we utilize a similar command for obtaining the value of the password attribute, whereby we assign the output of the command to a variable named pword, and change the XPATH expression.


Edit 1: A more efficient command

As per Charles Duffy's first comment below... you can also extract both attribute values more efficiently using the following command instead:

{ IFS= read -r uname && IFS= read -r pword; } < <(xml sel -t -v "/Context/Resource/@username" -n -v "/Context/Resource/@password" path/to/file.xml)

echo "$uname $pword" # --> demo test

The main benefit here is that the source XML file is only read once.


Edit 2: Using XML Starlet to generate an XSLT template that can then be run on any system with xsltproc, including hosts that don't have XML Starlet installed:

As per Charles Duffy's second comment below...

It's also possible to utilize XML Starlet to generate an xslt template which is derived from the XML Starlet query shown previously. The .xsl file which is generated can then be run on any system which has xsltproc available (including hosts that don't have XML Starlet installed).

The following steps demonstrate how to achieve this:

  1. Firstly run the following XML Starlet command to generate the .xsl file:

    xml sel -C -t -v "/Context/Resource/@username" -n -v "/Context/Resource/@password" path/to/file.xml > path/to/resultant/my-template.xsl
    

    This command is very similar to the previously shown XML Starlet command. The notable differences are:

    • The additional -C option between sel and -t
    • The redirection operator > and a file path. This specifies the location at which to save the output, (i.e. the generated XSLT template/stylesheet).

      Note the path/to/resultant/my-template.xsl part should be changed as necessary.

    The contents of the generated XSLT stylesheet will be something like the following:

    my-template.xsl

    <?xml version="1.0"?>
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:exslt="http://exslt.org/common" version="1.0" extension-element-prefixes="exslt">
      <xsl:output omit-xml-declaration="yes" indent="no"/>
      <xsl:template match="/">
        <xsl:call-template name="value-of-template">
          <xsl:with-param name="select" select="/Context/Resource/@username"/>
        </xsl:call-template>
        <xsl:value-of select="'&#10;'"/>
        <xsl:call-template name="value-of-template">
          <xsl:with-param name="select" select="/Context/Resource/@password"/>
        </xsl:call-template>
      </xsl:template>
      <xsl:template name="value-of-template">
        <xsl:param name="select"/>
        <xsl:value-of select="$select"/>
        <xsl:for-each select="exslt:node-set($select)[position()&gt;1]">
          <xsl:value-of select="'&#10;'"/>
          <xsl:value-of select="."/>
        </xsl:for-each>
      </xsl:template>
    </xsl:stylesheet>
    
  2. Next, run the following command which utilizes xsltproc to transform the source .xml file. This ultimately assigns the result of the transformation to the two variables, i.e. uname and pword:

    { IFS= read -r uname && IFS= read -r pword; } < <(xsltproc path/to/resultant/my-template.xsl path/to/file.xml)
    
    echo "$uname $pword" # --> demo test
    

    Note the parts reading path/to/resultant/my-template.xsl and path/to/file.xml should be changed as necessary.




回答2:

with perl one liner

perl -n0777E '
    # remove comments
    s/<!--.*?-->//gs;

    # match username and password with lookaheads and display in custom way
    say "user:$1\tpass:$2" while /<Resource(?=[^>]*\susername="([^"]*)")(?=[^>]*\spassword="([^"]*)")[^>]*>/g
' < file.xml


标签: xml bash shell