This question already has an answer here:
- How to parse XML in Bash? 15 answers
I would like to know what would be the best way to parse an XML file using shellscript ?
- Should one do it by hand ?
- Does third tiers library exist ?
If you already made it if you could let me know how did you manage to do it
I am surprised no one has mentioned xmlsh. The mission statement :
A list of shell like commands are provided here.
I use the
xed
command a lot which is equivalent tosed
for XML, and allowsXPath
based search and replaces.Try sgrep. It's not clear exactly what you are trying to do, but I surely would not attempt writing an XML parser in bash.
Here's a solution using xml_grep (because xpath wasn't part of our distributable and I didn't want to add it to all production machines)...
If you are looking for a specific setting in an XML file, and if all elements at a given tree level are unique, and there are no attributes, then you can use this handy function:
This will work with this structure:
It will also work with this (but it won't keep the newlines):
If you have duplicate <config> or <logs> or <path>, then it will only return the last one. You can probably modify the function to return an array if it finds multiple matches.
FYI: This code works on RedHat 6.3 with GNU BASH 4.1.2, but I don't think I'm doing anything particular to that, so should work everywhere.
NOTE: For anybody new to scripting, make sure you use the right types of quotes, all three are used in this code (normal single quote '=literal, backward single quote `=execute, and double quote "=group).
This really is beyond the capabilities of shell script. Shell script and the standard Unix tools are okay at parsing line oriented files, but things change when you talk about XML. Even simple tags can present a problem:
Imagine trying to write a shell script that can read the data enclosed in . The three very, very simply XML examples all show different ways this can be an issue. The first two examples are the exact same syntax in XML. The third simply has an attribute attached to it. The fourth contains the data in another tag. Simple
sed
,awk
, andgrep
commands cannot catch all possibilities.You need to use a full blown scripting language like Perl, Python, or Ruby. Each of these have modules that can parse XML data and make the underlying structure easier to access. I've use XML::Simple in Perl. It took me a few tries to understand it, but it did what I needed, and made my programming much easier.
Do you have xml_grep installed? It's a perl based utility standard on some distributions (it came pre-installed on my CentOS system). Rather than giving it a regular expression, you give it an xpath expression.