How to parse XML using shellscript? [duplicate]

2019-01-03 13:30发布

This question already has an answer here:

I would like to know what would be the best way to parse an XML file using shellscript ?

  • Should one do it by hand ?
  • Does third tiers library exist ?

If you already made it if you could let me know how did you manage to do it

标签: linux bash shell
11条回答
女痞
2楼-- · 2019-01-03 14:22

I am surprised no one has mentioned xmlsh. The mission statement :

A command line shell for XML Based on the philosophy and design of the Unix Shells

xmlsh provides a familiar scripting environment, but specifically tailored for scripting xml processes.

A list of shell like commands are provided here.

I use the xed command a lot which is equivalent to sed for XML, and allows XPath based search and replaces.

查看更多
爷的心禁止访问
3楼-- · 2019-01-03 14:23

Try sgrep. It's not clear exactly what you are trying to do, but I surely would not attempt writing an XML parser in bash.

查看更多
做自己的国王
4楼-- · 2019-01-03 14:23

Here's a solution using xml_grep (because xpath wasn't part of our distributable and I didn't want to add it to all production machines)...

If you are looking for a specific setting in an XML file, and if all elements at a given tree level are unique, and there are no attributes, then you can use this handy function:

# File to be parsed
xmlFile="xxxxxxx"

# use xml_grep to find settings in an XML file
# Input ($1): path to setting
function getXmlSetting() {

    # Filter out the element name for parsing
    local element=`echo $1 | sed 's/^.*\///'`

    # Verify the element is not empty
    local check=${element:?getXmlSetting invalid input: $1}

    # Parse out the CDATA from the XML element
    # 1) Find the element (xml_grep)
    # 2) Remove newlines (tr -d \n)
    # 3) Extract CDATA by looking for *element> CDATA <element*
    # 4) Remove leading and trailing spaces
    local getXmlSettingResult=`xml_grep --cond $1 $xmlFile 2>/dev/null | tr -d '\n' | sed -n -e "s/.*$element>[[:space:]]*\([^[:space:]].*[^[:space:]]\)[[:space:]]*<\/$element.*/\1/p"`

    # Return the result
    echo $getXmlSettingResult
}

#EXAMPLE
logPath=`getXmlSetting //config/logs/path`
check=${logPath:?"XML file missing //config/logs/path"}

This will work with this structure:

<config>
  <logs>
     <path>/path/to/logs</path>
  <logs>
</config>

It will also work with this (but it won't keep the newlines):

<config>
  <logs>
     <path>
          /path/to/logs
     </path>
  <logs>
</config>

If you have duplicate <config> or <logs> or <path>, then it will only return the last one. You can probably modify the function to return an array if it finds multiple matches.

FYI: This code works on RedHat 6.3 with GNU BASH 4.1.2, but I don't think I'm doing anything particular to that, so should work everywhere.

NOTE: For anybody new to scripting, make sure you use the right types of quotes, all three are used in this code (normal single quote '=literal, backward single quote `=execute, and double quote "=group).

查看更多
Bombasti
5楼-- · 2019-01-03 14:27

This really is beyond the capabilities of shell script. Shell script and the standard Unix tools are okay at parsing line oriented files, but things change when you talk about XML. Even simple tags can present a problem:

<MYTAG>Data</MYTAG>

<MYTAG>
     Data
</MYTAG>

<MYTAG param="value">Data</MYTAG>

<MYTAG><ANOTHER_TAG>Data
</ANOTHER_TAG><MYTAG>

Imagine trying to write a shell script that can read the data enclosed in . The three very, very simply XML examples all show different ways this can be an issue. The first two examples are the exact same syntax in XML. The third simply has an attribute attached to it. The fourth contains the data in another tag. Simple sed, awk, and grep commands cannot catch all possibilities.

You need to use a full blown scripting language like Perl, Python, or Ruby. Each of these have modules that can parse XML data and make the underlying structure easier to access. I've use XML::Simple in Perl. It took me a few tries to understand it, but it did what I needed, and made my programming much easier.

查看更多
甜甜的少女心
6楼-- · 2019-01-03 14:32

Do you have xml_grep installed? It's a perl based utility standard on some distributions (it came pre-installed on my CentOS system). Rather than giving it a regular expression, you give it an xpath expression.

查看更多
登录 后发表回答