Grep for multiple patterns in a file

2019-09-11 23:22发布

I'd like to count number of xml nodes in my xml file(grep or somehow).

....
<countryCode>GBR</countryCode>
<countryCode>USA</countryCode>
<countryCode>CAN</countryCode>
...
<countryCode>CAN</countryCode>
<someNode>USA</someNode>
<countryCode>CAN</countryCode>
<someNode>Otherone</someNode>
<countryCode>GBR</countryCode>
...

How to get count of individual countries like CAN = 3, USA = 1, GBR = 2? Without passing in the names of the countries there might be some more countries?

Update:

There are other nodes beside countrycode

标签: linux shell unix
8条回答
地球回转人心会变
2楼-- · 2019-09-12 00:20

If your file is set up as you had shown to us, awk can do it like:

awk -F '<\/?countryCode>' '{ a[$2]++} END { for (e in a) { printf("%s\t%i\n",e,a[e]) }' INPUTFILE

If there are more than one <countryCode> tag on a line, you can still set up some pipe to make it into one line, e.g.:

sed 's/<countryCode>/\n<countryCode>/g' INPUTFILE | awk ...

Note if the <countryCode> spans to multiple lines, it does not work as expected.

Anyway, I'd recommend to use xpath for this kind of task (perl's xml::xpath module has a CLI utility for this.

查看更多
聊天终结者
3楼-- · 2019-09-12 00:25

grep can give a total count, but it doesn't do a per-pattern; for that you should use uniq -c:

$ uniq -c <(sort file)
  1 
  1  
  3 <countryCode>CAN</countryCode>
  2 <countryCode>GBR</countryCode>
  1 <countryCode>USA</countryCode>

If you want to get rid of the empty lines and tags, add sed:

$ sed -e '/^[[:space:]]*$/d' -e 's/<.*>\([A-Z]*\)<.*>/\1/g' test | sort | uniq -c
  3 CAN
  2 GBR
  1 USA

To delete lines that don't have a country code, add another command to sed:

$ sed -e '/countryCode/!d' -e '/^[[:space:]]*$/d' -e 's/<.*>\([A-Z]*\)<.*>/\1/g' test | sort | uniq -c
  3 CAN
  2 GBR
  1 USA
查看更多
登录 后发表回答