Sorting and counting method faster then cat file

Sorting and counting method faster then cat file |

2019-09-14 23:13发布

I am just looking to count how many times A, B or C field names appear in the log file. The field list needs to be dynamic. Log files are 'big' about 500 megs each so it takes a while to sort each file. Is there a faster way to do the count once I do the cut and get a file with one field per line?

 cat /bb/logs/$dir/$file.txt | tr -s "|" "\n" | cut -d "=" -f 1 | sort | uniq -c > /data/logs/$dir/$file.txt.count

I know for a fact that this part runs fast. I can see with certainty it gets bogged down in the sort.

cat /bb/logs/$dir/$file.txt | tr -s "|" "\n" | cut -d "=" -f 1

After I have run the cut a sample output is below, of course the file is much longer

Apple
Banana
Grape
Pear
Grape

After the sort and count I get

 1 Apple
 1 Banana 
 1 Pear
 2 Grape

The problem is the sort for my actual data takes way too long. I think it would be faster to > the output of the cut to a file but not sure the fastest way to count unique entries in a 'large' text file

标签： sorting cat uniq

1条回答

Luminary・发光体

2楼-- · 2019-09-14 23:42

AWK can do it pretty well without sorting, try this, it should perform better;

cat test.txt | tr -s "|" "\n" | cut -d "=" -f 1 |
   awk '{count[$1]++}END{for(c in count) print c,"found "count[c]" times."}'

0人赞添加讨论(0) 举报

Sorting and counting method faster then cat file |

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间