Grep for multiple patterns in a file-第2页回答

I'd like to count number of xml nodes in my xml file(grep or somehow).

....
<countryCode>GBR</countryCode>
<countryCode>USA</countryCode>
<countryCode>CAN</countryCode>
...
<countryCode>CAN</countryCode>
<someNode>USA</someNode>
<countryCode>CAN</countryCode>
<someNode>Otherone</someNode>
<countryCode>GBR</countryCode>
...

How to get count of individual countries like CAN = 3, USA = 1, GBR = 2? Without passing in the names of the countries there might be some more countries?

Update:

There are other nodes beside countrycode

标签： linux shell unix

8条回答

地球回转人心会变

2楼-- · 2019-09-12 00:20

If your file is set up as you had shown to us, awk can do it like:

awk -F '<\/?countryCode>' '{ a[$2]++} END { for (e in a) { printf("%s\t%i\n",e,a[e]) }' INPUTFILE

If there are more than one <countryCode> tag on a line, you can still set up some pipe to make it into one line, e.g.:

sed 's/<countryCode>/\n<countryCode>/g' INPUTFILE | awk ...

Note if the <countryCode> spans to multiple lines, it does not work as expected.

Anyway, I'd recommend to use xpath for this kind of task (perl's xml::xpath module has a CLI utility for this.

0人赞添加讨论(0) 举报

聊天终结者

3楼-- · 2019-09-12 00:25

grep can give a total count, but it doesn't do a per-pattern; for that you should use uniq -c:

$ uniq -c <(sort file)
  1 
  1  
  3 <countryCode>CAN</countryCode>
  2 <countryCode>GBR</countryCode>
  1 <countryCode>USA</countryCode>

If you want to get rid of the empty lines and tags, add sed:

$ sed -e '/^[[:space:]]*$/d' -e 's/<.*>\([A-Z]*\)<.*>/\1/g' test | sort | uniq -c
  3 CAN
  2 GBR
  1 USA

To delete lines that don't have a country code, add another command to sed:

$ sed -e '/countryCode/!d' -e '/^[[:space:]]*$/d' -e 's/<.*>\([A-Z]*\)<.*>/\1/g' test | sort | uniq -c
  3 CAN
  2 GBR
  1 USA

0人赞添加讨论(0) 举报

上一页 1 2

Grep for multiple patterns in a file

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间