Ideally, what I would like to be able to do is:
cat xhtmlfile.xhtml |
getElementViaXPath --path='/html/head/title' |
sed -e 's%(^<title>|</title>$)%%g' > titleOfXHTMLPage.txt
Ideally, what I would like to be able to do is:
cat xhtmlfile.xhtml |
getElementViaXPath --path='/html/head/title' |
sed -e 's%(^<title>|</title>$)%%g' > titleOfXHTMLPage.txt
You can do that very easily using only bash. You only have to add this function:
Now you can use rdom like read but for html documents. When called rdom will assign the element to variable E and the content to var C.
For example, to do what you wanted to do:
Well, you can use xpath utility. I guess perl's XML::Xpath contains it.
This works if you are wanting XML attributes: