可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
This question already has an answer here:
-
How to parse XML in Bash?
15 answers
I would like to know what would be the best way to parse an XML file using shellscript ?
- Should one do it by hand ?
- Does third tiers library exist ?
If you already made it if you could let me know how did you manage to do it
回答1:
You could try xmllint
The xmllint program parses one or more
XML files, specified on the command
line as xmlfile. It prints various
types of output, depending upon the
options selected. It is useful for
detecting errors both in XML code and
in the XML parser itse
It allows you select elements in the XML doc by xpath, using the --pattern option.
On Mac OS X (Yosemite), it is installed by default.
On Ubuntu, if it is not already installed, you can run apt-get install libxml2-utils
回答2:
Here's a full working example.
If it's only extracting email addresses you could just do something like:
1) Suppose XML file spam.xml is like
<spam>
<victims>
<victim>
<name>The Pope</name>
<email>pope@vatican.gob.va</email>
<is_satan>0</is_satan>
</victim>
<victim>
<name>George Bush</name>
<email>father@nwo.com</email>
<is_satan>1</is_satan>
</victim>
<victim>
<name>George Bush Jr</name>
<email>son@nwo.com</email>
<is_satan>0</is_satan>
</victim>
</victims>
</spam>
2) You can get the emails and process them with this short bash code:
#!/bin/bash
emails=($(grep -oP '(?<=email>)[^<]+' "/my_path/spam.xml"))
for i in ${!emails[*]}
do
echo "$i" "${emails[$i]}"
# instead of echo use the values to send emails, etc
done
Result of this example is:
0 pope@vatican.gob.va
1 father@nwo.com
2 son@nwo.com
Important note:
Don't use this for serious matters. This is OK for playing around, getting quick results, learning grep, etc. but you should definitely look for, learn and use an XML parser for production (see Micha's comment below).
回答3:
There's also xmlstarlet (which is available for Windows as well).
http://xmlstar.sourceforge.net/doc/xmlstarlet.txt
回答4:
I am surprised no one has mentioned xmlsh. The mission statement :
A command line shell for XML Based on the philosophy and design of the
Unix Shells
xmlsh provides a familiar scripting environment, but specifically
tailored for scripting xml processes.
A list of shell like commands are provided here.
I use the xed
command a lot which is equivalent to sed
for XML, and allows XPath
based search and replaces.
回答5:
Try sgrep. It's not clear exactly what you are trying to do, but I surely would not attempt writing an XML parser in bash.
回答6:
Do you have xml_grep installed? It's a perl based utility standard on some distributions (it came pre-installed on my CentOS system). Rather than giving it a regular expression, you give it an xpath expression.
回答7:
A rather new project is the xml-coreutils package featuring xml-cat, xml-cp, xml-cut, xml-grep, ...
http://xml-coreutils.sourceforge.net/contents.html
回答8:
Try using xpath. You can use it to parse elements out of an xml tree.
http://www.ibm.com/developerworks/xml/library/x-tipclp/index.html
回答9:
This really is beyond the capabilities of shell script. Shell script and the standard Unix tools are okay at parsing line oriented files, but things change when you talk about XML. Even simple tags can present a problem:
<MYTAG>Data</MYTAG>
<MYTAG>
Data
</MYTAG>
<MYTAG param="value">Data</MYTAG>
<MYTAG><ANOTHER_TAG>Data
</ANOTHER_TAG><MYTAG>
Imagine trying to write a shell script that can read the data enclosed in . The three very, very simply XML examples all show different ways this can be an issue. The first two examples are the exact same syntax in XML. The third simply has an attribute attached to it. The fourth contains the data in another tag. Simple sed
, awk
, and grep
commands cannot catch all possibilities.
You need to use a full blown scripting language like Perl, Python, or Ruby. Each of these have modules that can parse XML data and make the underlying structure easier to access. I've use XML::Simple in Perl. It took me a few tries to understand it, but it did what I needed, and made my programming much easier.
回答10:
Here's a function which will convert XML name-value pairs and attributes into bash variables.
http://www.humbug.in/2010/parse-simple-xml-files-using-bash-extract-name-value-pairs-and-attributes/
回答11:
Here's a solution using xml_grep (because xpath wasn't part of our distributable and I didn't want to add it to all production machines)...
If you are looking for a specific setting in an XML file, and if all elements at a given tree level are unique, and there are no attributes, then you can use this handy function:
# File to be parsed
xmlFile="xxxxxxx"
# use xml_grep to find settings in an XML file
# Input ($1): path to setting
function getXmlSetting() {
# Filter out the element name for parsing
local element=`echo $1 | sed 's/^.*\///'`
# Verify the element is not empty
local check=${element:?getXmlSetting invalid input: $1}
# Parse out the CDATA from the XML element
# 1) Find the element (xml_grep)
# 2) Remove newlines (tr -d \n)
# 3) Extract CDATA by looking for *element> CDATA <element*
# 4) Remove leading and trailing spaces
local getXmlSettingResult=`xml_grep --cond $1 $xmlFile 2>/dev/null | tr -d '\n' | sed -n -e "s/.*$element>[[:space:]]*\([^[:space:]].*[^[:space:]]\)[[:space:]]*<\/$element.*/\1/p"`
# Return the result
echo $getXmlSettingResult
}
#EXAMPLE
logPath=`getXmlSetting //config/logs/path`
check=${logPath:?"XML file missing //config/logs/path"}
This will work with this structure:
<config>
<logs>
<path>/path/to/logs</path>
<logs>
</config>
It will also work with this (but it won't keep the newlines):
<config>
<logs>
<path>
/path/to/logs
</path>
<logs>
</config>
If you have duplicate <config> or <logs> or <path>, then it will only return the last one. You can probably modify the function to return an array if it finds multiple matches.
FYI: This code works on RedHat 6.3 with GNU BASH 4.1.2, but I don't think I'm doing anything particular to that, so should work everywhere.
NOTE: For anybody new to scripting, make sure you use the right types of quotes, all three are used in this code (normal single quote '=literal, backward single quote `=execute, and double quote "=group).