How to remove XML tags from Unix command line?

I am grepping an XML File, which gives me output like this:

<tag>data</tag>
<tag>more data</tag>
...

Note, this is a flat file, not an XML tree. I want to remove the XML tags and just display the data in between. I'm doing all this from the command line and was wondering if there is a better way than piping it into awk twice...

cat file.xml | awk -F'>' '{print $2}' | awk -F'<' '{print $1}'

Ideally, I would like to do this in one command

标签： xml shell unix command-line xml-parsing

5条回答

爱情/是我丢掉的垃圾

2楼-- · 2019-03-09 11:17

Give this a try:

grep -Po '<.*?>\K.*?(?=<.*?>)' inputfile

Explanation:

Using Perl Compatible Regular Expressions (-P) and outputting only the specified matches (-o):

<.*?> - Non-greedy match of any characters within angle brackets
\K - Don't include the preceding match in the output (reset match start - similar to positive look-behind, but it works with variable-length matches)
.*? - Non-greedy match stopping at the next match (this part will be output)
(?=<.*?>) - Non-greedy match of any characters within angle brackets and don't include the match in the output (positive look-ahead - works with variable-length matches)

0人赞添加讨论(0) 举报

一纸荒年 Trace。

3楼-- · 2019-03-09 11:22

If your file looks just like that, then sed can help you:

sed -e 's/<[^>]*>//g' file.xml

Of course you should not use regular expressions for parsing XML because it's hard.

0人赞添加讨论(0) 举报

beautiful°

4楼-- · 2019-03-09 11:22

Use html2text command-line tool, which converts html into plain text.

Alternatively you may try ex-way:

ex -s +'%s/<[^>].\{-}>//ge' +%p +q! file.txt

or:

cat file.txt | ex -s +'%s/<[^>].\{-}>//ge' +%p +q! /dev/stdin

0人赞添加讨论(0) 举报

乱世女痞

5楼-- · 2019-03-09 11:29

I know this is not a "perlgolf contest", but I used to use this trick.

Set Record Separator for < or >, then print only odd lines:

awk -vRS='<|>' NR%2 file.xml

0人赞添加讨论(0) 举报

再贱就再见

6楼-- · 2019-03-09 11:37

Using awk:

awk '{gsub(/<[^>]*>/,"")};1' file.xml

0人赞添加讨论(0) 举报

How to remove XML tags from Unix command line?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间