How to parse XML in Bash?

2018-12-31 07:18发布

Ideally, what I would like to be able to do is:

cat xhtmlfile.xhtml |
getElementViaXPath --path='/html/head/title' |
sed -e 's%(^<title>|</title>$)%%g' > titleOfXHTMLPage.txt

15条回答
笑指拈花
2楼-- · 2018-12-31 07:43

You can do that very easily using only bash. You only have to add this function:

rdom () { local IFS=\> ; read -d \< E C ;}

Now you can use rdom like read but for html documents. When called rdom will assign the element to variable E and the content to var C.

For example, to do what you wanted to do:

while rdom; do
    if [[ $E = title ]]; then
        echo $C
        exit
    fi
done < xhtmlfile.xhtml > titleOfXHTMLPage.txt
查看更多
爱死公子算了
3楼-- · 2018-12-31 07:43

Well, you can use xpath utility. I guess perl's XML::Xpath contains it.

查看更多
初与友歌
4楼-- · 2018-12-31 07:45

This works if you are wanting XML attributes:

$ cat alfa.xml
<video server="asdf.com" stream="H264_400.mp4" cdn="limelight"/>

$ sed 's.[^ ]*..;s./>..' alfa.xml > alfa.sh

$ . ./alfa.sh

$ echo "$stream"
H264_400.mp4
查看更多
登录 后发表回答