可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I'd like to count number of xml nodes in my xml file(grep or somehow).
....
<countryCode>GBR</countryCode>
<countryCode>USA</countryCode>
<countryCode>CAN</countryCode>
...
<countryCode>CAN</countryCode>
<someNode>USA</someNode>
<countryCode>CAN</countryCode>
<someNode>Otherone</someNode>
<countryCode>GBR</countryCode>
...
How to get count of individual countries like CAN = 3, USA = 1, GBR = 2? Without passing in the names of the countries there might be some more countries?
Update:
There are other nodes beside countrycode
回答1:
My simple suggestion would be to use sort
and uniq -c
$ echo '<countryCode>GBR</countryCode>
<countryCode>USA</countryCode>
<countryCode>CAN</countryCode>
<countryCode>CAN</countryCode>
<countryCode>CAN</countryCode>
<countryCode>GBR</countryCode>' | sort | uniq -c
3 <countryCode>CAN</countryCode>
2 <countryCode>GBR</countryCode>
1 <countryCode>USA</countryCode>
Where you'd pipe in the output of your grep
instead of an echo
. A more robust solution would be to use XPath. If youre XML file looks like
<countries>
<countryCode>GBR</countryCode>
<countryCode>USA</countryCode>
<countryCode>CAN</countryCode>
<countryCode>CAN</countryCode>
<countryCode>CAN</countryCode>
<countryCode>GBR</countryCode>
</countries>
Then you could use:
$ xpath -q -e '/countries/countryCode/text()' countries.xml | sort | uniq -c
3 CAN
2 GBR
1 USA
I say it's more robust because using tools designed for parsing flat text will be inherently flaky for dealing with XML. Depending on the context of the original XML file, a different XPath query might work better, which would match them anywhere:
$ xpath -q -e '//countryCode/text()' countries.xml | sort | uniq -c
3 CAN
2 GBR
1 USA
回答2:
grep
can give a total count, but it doesn't do a per-pattern; for that you should use uniq -c
:
$ uniq -c <(sort file)
1
1
3 <countryCode>CAN</countryCode>
2 <countryCode>GBR</countryCode>
1 <countryCode>USA</countryCode>
If you want to get rid of the empty lines and tags, add sed
:
$ sed -e '/^[[:space:]]*$/d' -e 's/<.*>\([A-Z]*\)<.*>/\1/g' test | sort | uniq -c
3 CAN
2 GBR
1 USA
To delete lines that don't have a country code, add another command to sed
:
$ sed -e '/countryCode/!d' -e '/^[[:space:]]*$/d' -e 's/<.*>\([A-Z]*\)<.*>/\1/g' test | sort | uniq -c
3 CAN
2 GBR
1 USA
回答3:
quick and dirty (only based on your example text):
awk -F'>|<' '{a[$3]++;}END{for(x in a)print x,a[x]}' file
test:
kent$ cat t.txt
<countryCode>GBR</countryCode>
<countryCode>USA</countryCode>
<countryCode>CAN</countryCode>
<countryCode>CAN</countryCode>
<countryCode>CAN</countryCode>
<countryCode>GBR</countryCode>
kent$ awk -F'>|<' '{a[$3]++;}END{for(x in a)print x,a[x]}' t.txt
USA 1
GBR 2
CAN 3
回答4:
sed -n "s/<countryCode>\(.*\)<\/countryCode>/\1/p"|sort|uniq -c
回答5:
cat dummy | sort |cut -c14-16 | sort |tail -6 |awk '{col[$1]++} END {for (i in col) print i, col[i]}'
Dummy is ur file name and replace 6 in -6 with n-2(n - no of lines in ur data file)
回答6:
Something like this maybe:
grep -e 'regex' file.xml | sort | uniq -c
Of course you need to provide regex that matches your needs.
回答7:
If your file is set up as you had shown to us, awk
can do it like:
awk -F '<\/?countryCode>' '{ a[$2]++} END { for (e in a) { printf("%s\t%i\n",e,a[e]) }' INPUTFILE
If there are more than one <countryCode>
tag on a line, you can still set up some pipe to make it into one line, e.g.:
sed 's/<countryCode>/\n<countryCode>/g' INPUTFILE | awk ...
Note if the <countryCode>
spans to multiple lines, it does not work as expected.
Anyway, I'd recommend to use xpath
for this kind of task (perl
's xml::xpath
module has a CLI utility for this.
回答8:
Quick and simple:
grep countryCode ./file.xml | sort | uniq -c