Command line to sum frequency in concatenated file

2019-08-19 02:24发布

I need to summarize the frequency of one column of several large tab-separated files. An example of the content in the file is :

Blue    table   3 
Blue    chair   2 
Big cat 1 
Small   cat 2

After concatenating the files, the trouble is the following:

Column 2 essentially is a frequency count of the amount of times the combination of Column 0 and Column 1 were seen together.

I need to add the frequency of all of the identical combinations in Column 2 of the concatenated file.

For instance: If in File A the contents are as follows:

Blue    table   3
Blue    chair   2
Big cat 1
Small   cat 2

and in File B the contents are as follows:

Blue    table   3
Blue    chair   2
Big cat 1
Small   cat 2

the contents in the concatenated File C are as follows:

Blue    table   3
Blue    chair   2
Big cat 1
Small   cat 2
Blue    table   3
Blue    chair   2
Big cat 1
Small   cat 2

I want to sum the frequencies of all identical combos in Column 0 and Column 1 in a File D to get the following results:

Blue    table   6
Blue    chair   4
Big cat 2
Small   cat 4

I tried to sort and count the info with the following command:

 sort <input_file> | uniq -c <output_file>

but the result is the following:

  2 Big cat 1
  2 Blue    chair   2
  2 Blue    table   3
  2 Small   cat 2

Does anyone have a suggestion of a terminal command that can produce my desired results?

Thank you in advance for any help.

标签： sorting count command-line-arguments

1条回答

混吃等死

2楼-- · 2019-08-19 02:43

You're close; you have all the numbers you need. The total for each row is the count of rows that you got from uniq (column 1) times the frequency count (column 4). You can calculate that with awk:

sort input.txt | uniq -c  | awk ' {  print $2 "\t" $3 "\t" $1*$4 } '

0人赞添加讨论(0) 举报

Command line to sum frequency in concatenated file

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间