Unique the columns and get the frequencies in linu

I have a data.txt with a matrix structure (4 X 9):

I want to count the frequencies of unique columns, the expected result is:

I only find "unique lines according to specific columns" using awk on the Internet, do I need to first transpose my data to solve this problem. I wonder whether there is a more direct way to figure it out? Thank you.

标签： linux bash unique

4条回答

孤傲高冷的网名

2楼-- · 2019-09-16 20:29

although not needed, here is a tranpose and count solution with unix toolset.

$ sed 's/./&\n/g' file | 
  sed '/^$/d'          | 
  pr -4ts' '           | 
  tr -d ' '            | 
  sort                 | 
  uniq -c              | 
  awk '{print $2,$1}'

0000 1
0001 3
0010 1
1001 2
1010 1
1110 1

0人赞添加讨论(0) 举报

欢心

3楼-- · 2019-09-16 20:32

This awk will help:

awk '{for (i=1;i<=NF;i++){
         a[i]=a[i]""$i
       }
     }
     END{
     for (i=1;i<=9;i++) {
       res[a[i]]++
       }
     for (r in res){
         print r, res[r] 
       }
     }' FS= yourfile

Result

Explanation

for (i=1;i<=NF;i++){
         a[i]=a[i]""$i
       }
     }

Stores the info in a nine column array as a key, as we know that it’s a regular matrix we will append each value to its position

 for (i=1;i<=9;i++) {
   res[a[i]]++
   }

Store the number into an associative array and count the occurrences

 for (r in res){
     print r, res[r] 
   }

Just show the final result.

0人赞添加讨论(0) 举报

SAY GOODBYE

4楼-- · 2019-09-16 20:49

You don't need to transpose it. Use awk to split on empty field separator and append each value in an array indexed by column number. In the END block count the frequency and print it:

awk 'BEGIN{FS=""} {
   for (i=1; i<=NF; i++)
      a[i] = a[i] $i
}
END {
   for (i=1; i<=length(a); i++)
      freq[a[i]]++

   for(i in freq)
      print i, freq[i]
}' file

0000 1
0010 1
0001 3
1001 2
1010 1
1110 1

0人赞添加讨论(0) 举报

Bombasti

5楼-- · 2019-09-16 20:50

Perl to the rescue:

perl -aF// -lne '$s[$_] .= $F[$_] for 0 .. $#F;
                 }{
                 $c{$_}++ for @s;
                 print "$_\t$c{$_}" for keys %c' < data.txt

-n reads the input line by line
-l handles the newlines
aF// split each line by characters to the @F array
@s accumulates characters from particular columns
At the end, the hash table %c is used to count the frequencies.

0人赞添加讨论(0) 举报

Unique the columns and get the frequencies in linu

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间