script to extract the details from xml

2019-09-20 11:11发布

问题:

if have any xml file as below:

<soap env="abc" id="xyz">
<emp>acdf</emp>
<Workinstance name="ab" id="ab1">
<x>1</x>
<y>2</y>
</Workinstance>
<projectinstance name="cd" id="cd1">
<u>1</u>
<v>2</v>
</projectinstance>
</soap>

I want to extract the id field in workinstance using unix script

I tried grep but, it is retrieving the whole xml file. Can someone help me how to get it?

回答1:

You might want to consider something like XMLStarlet, which implements the XPath/XQuery specifications.

Parsing XML with regular expressions is essentially impossible even under the best of conditions, so the sooner you give up on trying to do this with grep, the better off you're likely to be.



回答2:

XmlStarlet seems the tool I was looking for!

To do extract your tag, try to do the following:

cat your_file.xml | xmlstarlet sel -t -v 'soap/Workinstance/@id'

The "soap/Workinstance/@id" is an XPath expression that will get the id attribute inside Workinstance tag. By using "-v" flag, you ask xmlstarlet to print the extracted text to the standard output.



回答3:

If you have Ruby

$ ruby -ne 'print $_.gsub(/.*id=\"|\".*$/,"" ) if /<Workinstance/' file
ab1


标签: unix shell