lookup field values in two fixed format file in un

2019-06-21 19:12发布

问题:

I have 2 fixed length files input#1 & input#2. I want to match the rows based on the value in position 37-50 in both files (pos 37-50 will have same value in both files).

If any matching record is found then cut the value against company code & Invoice number from input file #1 (position 99 until end of line).

The cut string (from Input #1) need to be appended at the end of the record/line.

Below is the code I tried (not working) and the input files & desired output. Please provide your advice.

Code:

awk '
NR==FNR && NF>1 {
    v=substr($0,37,14);
#print substr($0,37,14)
    next
}
NR==FNR && ( /Company Code/ OR /Invoice Number/ ) {
    sub(/Company Code/,"",$0);
    sub(/Invoice Number/,"",$0);
    a[v]=$0;
print $0
    next
}
(substr($0,37,14) in a) {
    print $0 a[substr($0,99)]
}' Input1.txt input2.txt input3.txt

End code

Input #1 beginning Start's with some white spaces

         612  1111111111201402120000       2     1  111  211 Due Date                             20140101                           
         612  1111111111201402120000       2     1  111  311 Company Code                         227                                
         612  1111111111201402120000       2     1  111  411 Item Code                            12                                 
         612  1111111111201402120000       2     1  111  511 Invoice Number                       2014010                            
         612  1111111111201402120000       2     2  111  611 Company Code                         214                                
         612  1111111111201402120000       2     2  111  711 Item Code                            20                                 
         612  1111111111201402120000       2     2  111  811 Invoice Number                       3014010                            
         612  1111111111201402120000       2     3  111  911 Due Date                             20140101                           
         612  1111111111201402120000       2     3  111  111 Invoice Number                       40140101                           
         612  1111111111201402120000       2     3  111  121 user code                            15563263636                        
         612  1111111111201402120000       2     3  111  131 Amount Due                           100000                             
         612  111111111120140212000078978982123444  111  141 Due Date                             20140101                             
         612  111111111120140212000078978982123444  111  151 Invoice Number                       50140101                             
         612  111111111120140212000078978982123444  111  161 Amount Due                          008000                             

Input #1 End

Input #2 beginning input 2

         510       77432201111010000       2     1        1ChK          100111000001    121000248           123456789            20111101.510.77432.20001C                         
         510       77432201111010000       2     1        2INv                                                                   20111101.510.77432.20001D                         
         510       77432201111010000       2     1        3INv                                                                   20111101.510.77432.20002D                         
         510       77432201111010000       2     1        4INv                                                                   20111101.510.77432.20003D                         
         510       77432201111010000       2     1        5INv                                                                   20111101.510.77432.20004D                         
         510       77432201111010000       2     2        1ChK          200111000002    121000248           123456789            20111101.510.77432.20002C                         
         510       77432201111010000       2     2        2INv                                                                   20111101.510.77432.20005D                         
         510       77432201111010000       2     2        3INv                                                                   20111101.510.77432.20006D                         
         510       77432201111010000       2     2        4INv                                                                   20111101.510.77432.20007D                         
         510       77432201111010000       2     2        5INv                                                                   20111101.510.77432.20008D                         
         510       77432201111010000       2     3        1ChK          300111000003    121000248           123456789            20111101.510.77432.20003C                         
         510       77432201111010000       2     3        2INv                                                                   20111101.510.77432.20009D                         
         510       77432201111010000       2     3        3INv                                                                   20111101.510.77432.20010D                         
         510       77432201111010000       2     3        4INv                                                                   20111101.510.77432.20011D                         
         510       77432201111010000       2     6        1ChK          600111000006    121000248           123456789            20111101.510.77432.20006C                         
         510       77432201111010000       2     6        2INv                                                                   20111101.510.77432.20021D                         
         510       77432201111010000       2     6        3INv                                                                   20111101.510.77432.20022D                         
         510       77432201111010000       2     6        4INv                                                                   20111101.510.77432.20023D                         
         510       77432201111010000       2     6        5INv                                                                   20111101.510.77432.20024D                         

Input #2 end

Desired outout Desired output

         510       77432201111010000       2     1        1ChK          100111000001    121000248           123456789            20111101.510.77432.20001C   2272014010 (company & Inv # from input 1)                     
         510       77432201111010000       2     1        2INv                                                                   20111101.510.77432.20001D   2272014010                                            
         510       77432201111010000       2     1        3INv                                                                   20111101.510.77432.20002D   2272014010                                            
         510       77432201111010000       2     1        4INv                                                                   20111101.510.77432.20003D   (company & Inv # from input 1)                      
         510       77432201111010000       2     1        5INv                                                                   20111101.510.77432.20004D   (company & Inv # from input 1)                      
         510       77432201111010000       2     2        1ChK          200111000002    121000248           123456789            20111101.510.77432.20002C   (company & Inv # from input 1)                      
         510       77432201111010000       2     2        2INv                                                                   20111101.510.77432.20005D   (company & Inv # from input 1)                      
         510       77432201111010000       2     2        3INv                                                                   20111101.510.77432.20006D   (company & Inv # from input 1)                      
         510       77432201111010000       2     2        4INv                                                                   20111101.510.77432.20007D   (company & Inv # from input 1)                      
         510       77432201111010000       2     2        5INv                                                                   20111101.510.77432.20008D   (company & Inv # from input 1)                      
         510       77432201111010000       2     3        1ChK          300111000003    121000248           123456789            20111101.510.77432.20003C   (company & Inv # from input 1)                      
         510       77432201111010000       2     6        1ChK          600111000006    121000248           123456789            20111101.510.77432.20006C   <there is no matching record in input 1, this will be blank>                      
         510       77432201111010000       2     6        2INv                                                                   20111101.510.77432.20021D   <there is no matching record in input 1, this will be blank>                      
         510       77432201111010000       2     6        3INv                                                                   20111101.510.77432.20022D   <there is no matching record in input 1, this will be blank>                      
         510       77432201111010000       2     6        4INv                                                                   20111101.510.77432.20023D   <there is no matching record in input 1, this will be blank>                      
         510       77432201111010000       2     6        5INv                                                                   20111101.510.77432.20024D   <there is no matching record in input 1, this will be blank>                      

回答1:

Try something like this (Untested):

awk '
NR==FNR && /Company Code/ {
    cc[$3,$4] = $NF;
    next;
}
NR==FNR && /Invoice Number/ {
    inv[$3,$4] = $NF;
    next;
}
NR==FNR {next}
{print $0 FS cc[$3,$4] inv[$3,$4]}' input1 input2


回答2:

There are several issues with your awk code.

Let's go through them step-by-step:

  1. NR==FNR && NF>1 {...;next}NR==FNR && ... --> the next will prevent the second action from being performed for all but the first record.

  2. NR==FNR && ( /Company Code/ OR /Invoice Number/ ) { --> OR is not a valid awk statement, the logical OR is done using || (like you use && and not AND).

  3. print $0 a[substr($0,99)] --> a[substr($0,99)] takes everything from the 99th position of the record in your 2nd input file to look up in your array, but your key is from 37-50.

We can fix them the follwing way:

  1. Get rid of the next in the first action and limit the 3rd action to records from the 2nd input file.

  2. Substitute OR with ||.

  3. Use substr($0,37,14) as key to lookup in a and substr(...,99) the result.

This results in the following code (removing your diagnostic print commands and the unused 3rd input file):

awk '
NR==FNR && NF>1 {
    v=substr($0,37,14);
}
NR==FNR && ( /Company Code/ || /Invoice Number/ ) {
    sub(/Company Code/,"",$0);
    sub(/Invoice Number/,"",$0);
    a[v]=$0;
    next
}
NR!=FNR && (substr($0,37,14) in a) {
    print $0 substr(a[substr($0,37,14)],99)
}' input1.txt input2.txt

Since your input was off I could not reproduce your desired output but I hope you can figure it out from here on.

Also, I shortened your code to the following version doing what I think you want it to do starting from the input given:

awk '
{key=substr($0,37,14)}
NR==FNR{
  if(/Company Code/||/Invoice Number/)array[key]=substr($0,98)
  next
}
(key in array){print $0,array[key]}
' input1.txt input2.txt

If you need adjustments/explanations, feel free to comment.



标签: shell unix awk