Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed last month.
I've been looking at a lot of posts and haven't quite found what I'm looking for. I'm not sure how to go about taking the following sample data:
host1 input nic1 ip1 ip2 PROT 30000 10
host1 input nic1 ip1 ip2 PROT 40000 10
host1 input nic1 ip1 ip2 PROT 50000 10
host1 input nic1 ip1 ip2 PROT 60000 10
host1 input nic1 ip3 ip2 PROT 10 30000
host1 input nic1 ip3 ip2 PROT 10 40000
host1 input nic1 ip3 ip2 PROT 10 50000
host1 input nic1 ip3 ip2 PROT 10 60000
host1 output nic1 ip2 ip1 PROT 10 30000
host1 output nic1 ip2 ip1 PROT 10 40000
host1 output nic1 ip2 ip1 PROT 10 50000
host1 output nic1 ip2 ip1 PROT 10 60000
host1 output nic1 ip2 ip3 PROT 30000 10
host1 output nic1 ip2 ip3 PROT 40000 10
host1 output nic1 ip2 ip3 PROT 50000 10
host1 output nic1 ip2 ip3 PROT 60000 10
host1 output loc ip2 ip2 PROT 10 30000
host1 output loc ip2 ip2 PROT 10 50000
And merge it into:
host1 input nic1 ip1 ip2 PROT 30000:60000 10
host1 input nic1 ip3 ip2 PROT 10 30000:60000
host1 output nic1 ip2 ip1 PROT 10 30000:60000
host1 output nic1 ip2 ip3 PROT 30000:60000 10
host1 output loc ip2 ip2 PROT 10 30000:50000
I have a large amount of data like this with the need to make ranges for multiple fields of a given line but I think if somebody can show me how to do it for one field as I have above, I should be able to figure the rest out. And if not I'll follow up :). Thanks in advance for any help.
Update
I have refactored the code in the answer below so as to make it more readable. The main body should read almost English prose.
#!/usr/bin/awk -f
# main body
NR == 1 {
copyRecordTo(veryold)
next
}
{
if (inSameGroup()) {
copyRecordTo(old)
} else {
makeRangeForField(NF - 1)
makeRangeForField(NF)
nicePrint()
copyRecordTo(veryold)
}
}
END {
makeRangeForField(NF - 1)
makeRangeForField(NF)
nicePrint()
}
# functions
function copyRecordTo(line) {
for (i = 1; i <= NF; ++i) line[i] = $i
}
function nicePrint() {
for (i = 1; i <= NF; ++i) {
i == NF - 1 ? fmt = "%s\t\t" : fmt = "%s\t"
printf(fmt, old[i])
}
printf("\n")
}
function makeRangeForField(f) {
if (old[f] != veryold[f])
old[f] = veryold[f]":"old[f]
}
function inSameGroup() {
b = 1
for (i = 1; i <= NF - 2; ++i)
b *= $i == veryold[i]
return b == 1
}
Original answer
The following awk
script generates almost what you are looking for.
Essentially the script does the following:
- stores in
veryold
the first line of each set of lines that differ only for the 7th and/or 8th filed
- stores in
old
the last read line
- the "boolean"
b
is used to check when that last line is surpassed
- when this happens the last two fields of
veryold
are joined with those of old
with a :
in between if they are different, and old
is printed
- one more tab
\t
is used between the last two fields to improve readability
Other two points:
NR == 1
is a special case that has to initialize veryold
only
- after the last line is read
END
handles the special case of the last line stored in old
#!/usr/bin/awk -f
NR == 1 {
for (i = 2; i <= NF; ++i) {
veryold[i] = $i
}
next
}
{
b = 1
for (i = 2; i <= NF - 2; ++i) {
b *= $i == veryold[i]
}
if (b == 1) {
for (i = 1; i <= NF; ++i) {
old[i] = $i
}
} else {
if (old[NF - 1] != veryold[NF - 1]) {
old[NF - 1] = veryold[NF - 1]":"old[NF - 1]
}
if (old[NF] != veryold[NF]) {
old[NF] = veryold[NF]":"old[NF]
}
for (i = 1; i <= NF; ++i) {
if (i == NF - 1) {
fmt = "%s\t\t"
} else {
fmt = "%s\t"
}
printf(fmt, old[i])
}
printf("\n")
for (i = 2; i <= NF; ++i) {
veryold[i] = $i
}
}
}
END {
if (old[NF - 1] != veryold[NF - 1]) {
old[NF - 1] = veryold[NF - 1]":"old[NF - 1]
}
if (old[NF] != veryold[NF]) {
old[NF] = veryold[NF]":"old[NF]
}
for (i = 1; i <= NF; ++i) {
if (i == NF - 1) {
fmt = "%s\t\t"
} else {
fmt = "%s\t"
}
printf(fmt, old[i])
}
}