The first question I asked on this topic was closed because of lack of info. So asking this again with some more details added.
I have to extract a value given in one tag from a xml file and I have to do it using ksh (I can solve this in perl but I have to do it ksh, cannot use third party tools like xmlsh)
sample.xml
<?xml version="1.0" standalone="yes" ?>
<parent_one>
<parent_two>
<Pool>
<pool_name>ABC</pool_name>
<percent_full>79</percent_full>
<pool_state>Enabled</pool_state>
</Pool>
<Pool>
<pool_name>DEF</pool_name>
<percent_full>40</percent_full>
<pool_state>Enabled</pool_state>
</Pool>
<Pool>
<pool_name>XYZ</pool_name>
<percent_full>40</percent_full>
<pool_state>Disabled</pool_state>
</Pool>
<Totals>
<total_tracks>4546456</total_tracks>
<percent_full>48</percent_full>
</Totals>
</parent_two>
</parent_one>
The ksh script should read sample.xml and print ABC, DEF from pool_name tag because the corresponding pool_state tag is enabled. It should not print XYZ because its pool_state tag is disabled.
The ksh script would read sample.xml and output the following
ABC
DEF
Is this feasible in ksh or do I have to use perl for this?
The sane solution to this problem is to make a call out to
xmllint --xpath
,xqilla -p
, or your favoriate Python/Ruby/Perl etc XML lib.Otherwise you can have a look at Roland Mainz's XML examples and extend them for your purposes.
If you were really serious about this you would probably want to look into writing bindings for libxml2 for ksh. I don't think anybody has done this yet.
I've done quite a lot of parsing of odd format files with (n)awk. Technically, this could be done with just ksh, but awk (and perl) are easier...
The following sample makes use of the start, end construct in
awk
that will only process the lines between the start and end patterns. (In this case<Pool>
and</Pool>
.)Other than that it's straightforward, using variables mimicking the xml elements for clarity.
This code will fail horribly when the xml is malformed, when multiple Pool elements are listed on a single line, etc.
That being said (my comment about trying to parse XML without a proper XML parser), let's give it a shot using sed/awk, not pure ksh. Take this answer as the foundation, remove all
<Pool></Pool>
blocks which havepool_state
set toDisabled
, then get the lines containingpool_name
and capture the value between the tags. If yourxml
file looks like your sample this should work, but will definitely break if it doesn't.You could fit the whole thing into one awk script, but I figured this might be easier to follow (OK, I am being lazy).