可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have a data.txt with a matrix structure (4 X 9):

I want to count the frequencies of unique columns, the expected result is:

I only find "unique lines according to specific columns" using awk on the Internet, do I need to first transpose my data to solve this problem. I wonder whether there is a more direct way to figure it out? Thank you.

回答1:

This awk will help:

awk '{for (i=1;i<=NF;i++){
         a[i]=a[i]""$i
       }
     }
     END{
     for (i=1;i<=9;i++) {
       res[a[i]]++
       }
     for (r in res){
         print r, res[r] 
       }
     }' FS= yourfile

Result

Explanation

for (i=1;i<=NF;i++){
         a[i]=a[i]""$i
       }
     }

Stores the info in a nine column array as a key, as we know that it’s a regular matrix we will append each value to its position

 for (i=1;i<=9;i++) {
   res[a[i]]++
   }

Store the number into an associative array and count the occurrences

 for (r in res){
     print r, res[r] 
   }

Just show the final result.

回答2:

You don't need to transpose it. Use awk to split on empty field separator and append each value in an array indexed by column number. In the END block count the frequency and print it:

awk 'BEGIN{FS=""} {
   for (i=1; i<=NF; i++)
      a[i] = a[i] $i
}
END {
   for (i=1; i<=length(a); i++)
      freq[a[i]]++

   for(i in freq)
      print i, freq[i]
}' file

0000 1
0010 1
0001 3
1001 2
1010 1
1110 1

回答3:

Perl to the rescue:

perl -aF// -lne '$s[$_] .= $F[$_] for 0 .. $#F;
                 }{
                 $c{$_}++ for @s;
                 print "$_\t$c{$_}" for keys %c' < data.txt

-n reads the input line by line
-l handles the newlines
aF// split each line by characters to the @F array
@s accumulates characters from particular columns
At the end, the hash table %c is used to count the frequencies.

回答4:

although not needed, here is a tranpose and count solution with unix toolset.

$ sed 's/./&\n/g' file | 
  sed '/^$/d'          | 
  pr -4ts' '           | 
  tr -d ' '            | 
  sort                 | 
  uniq -c              | 
  awk '{print $2,$1}'

0000 1
0001 3
0010 1
1001 2
1010 1
1110 1