I need to summarize the frequency of one column of several large tab-separated files. An example of the content in the file is :
Blue table 3
Blue chair 2
Big cat 1
Small cat 2
After concatenating the files, the trouble is the following:
Column 2 essentially is a frequency count of the amount of times the combination of Column 0 and Column 1 were seen together.
I need to add the frequency of all of the identical combinations in Column 2 of the concatenated file.
For instance: If in File A the contents are as follows:
Blue table 3
Blue chair 2
Big cat 1
Small cat 2
and in File B the contents are as follows:
Blue table 3
Blue chair 2
Big cat 1
Small cat 2
the contents in the concatenated File C are as follows:
Blue table 3
Blue chair 2
Big cat 1
Small cat 2
Blue table 3
Blue chair 2
Big cat 1
Small cat 2
I want to sum the frequencies of all identical combos in Column 0 and Column 1 in a File D to get the following results:
Blue table 6
Blue chair 4
Big cat 2
Small cat 4
I tried to sort and count the info with the following command:
sort <input_file> | uniq -c <output_file>
but the result is the following:
2 Big cat 1
2 Blue chair 2
2 Blue table 3
2 Small cat 2
Does anyone have a suggestion of a terminal command that can produce my desired results?
Thank you in advance for any help.
You're close; you have all the numbers you need. The total for each row is the count of rows that you got from uniq (column 1) times the frequency count (column 4). You can calculate that with awk: