if have any xml file as below:
<soap env="abc" id="xyz">
<emp>acdf</emp>
<Workinstance name="ab" id="ab1">
<x>1</x>
<y>2</y>
</Workinstance>
<projectinstance name="cd" id="cd1">
<u>1</u>
<v>2</v>
</projectinstance>
</soap>
I want to extract the id field in workinstance using unix script
I tried grep but, it is retrieving the whole xml file.
Can someone help me how to get it?
You might want to consider something like XMLStarlet, which implements the XPath/XQuery specifications.
Parsing XML with regular expressions is essentially impossible even under the best of conditions, so the sooner you give up on trying to do this with grep, the better off you're likely to be.
XmlStarlet seems the tool I was looking for!
To do extract your tag, try to do the following:
cat your_file.xml | xmlstarlet sel -t -v 'soap/Workinstance/@id'
The "soap/Workinstance/@id" is an XPath expression that will get the id attribute inside Workinstance tag. By using "-v" flag, you ask xmlstarlet to print the extracted text to the standard output.
If you have Ruby
$ ruby -ne 'print $_.gsub(/.*id=\"|\".*$/,"" ) if /<Workinstance/' file
ab1