awk - how two scripts interact with each other?

2019-08-04 16:23发布

I am not finding any clear tutorial on this topic. Say I have an input file as:

1 abc
1 def
1 ghi
1 lalala
1 heyhey
2 ahb
2 bbh
3 chch
3 chchch
3 oiohho
3 nonon
3 halal
3 whatever

Say I would like to find the maximum number of column one appeared first, which is "3" that appeared 6 times. Then i will need to feed this number (i.e. 6) to another script to go through the file to do some computations. What are the ways to do this?

Basically, i wonder if it's possible to write a function to go through the file and find "max" then in the main function calling the helper function. Also, i wonder if it's possible to do $(...) within the helper function to call 'awk' or other system functions?

2条回答
乱世女痞
2楼-- · 2019-08-04 16:47

We use a pipe for this. It takes the stdout of the first process and connects it to the stdin of the second.

awk ... | awk ...
查看更多
SAY GOODBYE
3楼-- · 2019-08-04 16:58
awk 'NR == FNR {nums[$1]++; next} ! flag {flag = 1; for (num in nums) {if (nums[i] > max) {max = nums[i]}}} {print max * $3}' filetomax filetoprocess

Here it is broken out on multiple lines:

awk '
    NR == FNR {
        nums[$1]++;
        next
    } 
    ! flag {
        flag = 1; 
        for (num in nums) {
            if (nums[i] > max) {
                max = nums[i]
            }
        }
    } 
    {
        print max * $3
    }
' filetomax filetoprocess

Here, we're doing the same operation to find the max of the numbers that you've seen before. Instead of using a main block and an END block, we're using a technique that's often used to process one file and then another. The NR == FNR condition is only true while the first file is read because the record number (NR) which is incremented for each line in all the files collectively is equal to the file record number (FNR) which is reset for each new file. In the block associated with this condition, count the times each number appears. The next statement causes execution to loop to read the next line from the files. When the second file is reached, the condition is no longer true and this block will be skipped.

The next conditional (! flag) checks to see if the contents of the variable are true. Since it hasn't been set, it's false. The exclamation point negates the condition so at this point execution moves into this block. Now the flag is set so the next time the condition is checked, this block will be skipped. The for loop checks to see which number appeared the most times, as in my answer to your other question.

Now, the second file can be processed in any way you like and the variable max is available for use during this processing. I have simply used a print statement to illustrate that. You can still use block selector conditionals, including one or more END blocks as you normally would. I don't show a BEGIN block, but you could add one at the top of this script for any initialization you need. Note that the processing of the first file could have been done in the BEGIN block using getline. That's simply another technique for accomplishing the same thing.

The filenames are listed in the order they are to be processed. The file to find the maximum counts in I've called "filetomax". The second file to do the main processing on I've called "filetoprocess".

查看更多
登录 后发表回答