Selecting columns using specific patterns then fin

2019-08-06 13:06发布

问题:

I want to calculate the sum and ratio values from data below. (The actual data contains more than 200,000 columns and 45000 rows (lines)).

For clarity purpose I have given only simple data format.

#Frame  BMR_42@O22  BMR_49@O13  BMR_59@O13  BMR_23@O26  BMR_10@O13  BMR_61@O26  BMR_23@O25 
 1      1           1           0           1           1           1           1
 2      0           1           0           0           1           1           0
 3      1           1           1           0           0           1           1
 4      1           1           0           0           1           0           1
 5      0           0           0           0           0           0           0
 6      1           0           1           1           0           1           0
 7      1           1           1           1           0           0           0
 8      1           1           1           0           0           0           0
 9      1           1           1           1           1           1           1
10      0           0           0           0           0           0           0

The columns need to be selected with certain criteria.

The column data which I consider is columns with "@O13" only. Below I have given the selected columns from above example.

BMR_49@O13  BMR_59@O13  BMR_10@O13  
1           0           1       
1           0           1       
1           1           0       
1           0           1       
0           0           0       
0           1           0       
1           1           0       
1           1           0       
1           1           1       
0           0           0   

From the selected column, I want to calculate:

1) the sum of all the "1"s. In this example we get value 16.

2) the number of total rows containing occurrence of "1" (at least once). From above example there are 8 rows which contain at least one occurrence of "1".

lastly,

3) the ratio of total of all "1"s with total lines with occurrence of "1"s.

That is :: (total of all "1"s)/(total rows with the occurance of "1"). Example 16/8

As a start, I tried with this command to select only the columns with "@O13"

awk '{for (i=1;i<=NF;i++) if (i~/@O13/); print ""}' $file2

Although this run but doesn't show up the values.

回答1:

This should do:

awk 'NR==1{for (i=1;i<=NF;i++) if ($i~/@O13/) a[i];next} {f=0;for (i in a) if ($i) {s++;f++};if (f) r++} END {print "number of 1="s"\nrows with 1="r"\nratio="s/r}' file
number of 1=16
rows with 1=8
ratio=2

Some more readable:

awk '
NR==1{
    for (i=1;i<=NF;i++) 
        if ($i~/@O13/)
            a[i]
        next
    }
    {
    f=0
    for (i in a)
        if ($i=="1") {
            s++
            f++
        }
    if (f) r++
    } 
END {
    print   "number of 1="s \
            "\nrows with 1="r \
            "\nratio="s/r
    }
' file


标签: awk gawk