I have an input text file in unix with this kind of data.
Event_date:20190512044638
Error_code:5858
Event_type:GPRS data
Duration:772
Missing_provider_id:46009
Event_date:20190512044638
Error_code:780678
Event_date:20190512064535
Error_code:5858
Event_type:GPRS data
Duration:2172
Missing_provider_id:722310
i want this data to be in this output format:
Event_date Error_code Event_type Duration Missing_provider_id
20190512044638 5858 GPRS data 772 46009
20190512044638 780678
20190512064535 5858 GPRS data 2172 722310
I tried a combination of awk and sed commands, but didn't work out.
How can i achieve this output?
Event_date:20190512044638
Error_code:5858
Event_type:GPRS data
Duration:772
Missing_provider_id:46009
Event_date:20190512044638
Error_code:780678
Event_date:20190512064535
Error_code:5858
Event_type:GPRS data
Duration:2172
Missing_provider_id:722310
i want this data to be in this output format:
Event_date Error_code Event_type Duration Missing_provider_id
20190512044638 5858 GPRS data 772 46009
20190512044638 780678
20190512064535 5858 GPRS data 2172 722310
This awk
may do: (tab separated fields)
PS this will fail if on field is missing, all need to come in order.
awk -F: 'NR==1 {print $1,$3,$5,$7,$9} {print $2,$4,$6,$8,$10}' RS= ORS='\n' OFS='\t' file
Event_date Error_code Event_type Duration Missing_provider_id
20190512044638 5858 GPRS data 772 46009
20190512044638 780678
20190512064535 5858 GPRS data 2172 722310
A more generic solution:
awk -F: 'NR==1 {print $1,$3,$5,$7,$9} {for(i=2;i<=NF;i+=2) printf "%s\t",$i;print ""}' RS= ORS='\n' OFS='\t' file
Event_date Error_code Event_type Duration Missing_provider_id
20190512044638 5858 GPRS data 772 46009
20190512044638 780678
20190512064535 5858 GPRS data 2172 722310
NR==1 {print $1,$3,$5,$7,$9}
can be set to some static header like NR==1 {print "F1","F2","F3","F4","F5"}
etc
Using GNU awk and 2D arrays:
awk '
BEGIN {
r=2 # data records in a start from 2
FS=":" # split at :
OFS="\t" # tab separated fields
a[0][0] # initialize a array
}
$0!="" { # for nonempty records
if(!($1 in a[0])) { # add keys to headers when needed
a[0][$1]=++f # for lookups
a[1][f]=$1 # for printing
}
a[r][a[0][$1]]=$2 # store value
next
}
{ # empty record -> new array record
r++
}
END { # after records are processed
# delete a[0][0] #
for(i=1;i<=r;i++) # iterate records
for(j=1;j<=f;j++) # iterate fields
printf "%s%s",a[i][j],(j==f?ORS:OFS) # output
}
' file | column -t -s $'\t' # column used for pretty-print
Output:
Event_date Error_code Event_type Duration Missing_provider_id
20190512044638 5858 GPRS data 772 46009
20190512044638 780678
20190512064535 5858 GPRS data 2172 722310
here is another one
awk -F: -v RS= 'BEGIN {OFS=FS}
NR==FNR {for(i=1;i<NF;i+=2)
if(!($i in h)) {h[$i]; ho[++c]=$i};
next}
FNR==1 {for(i=1;i<=c;i++) printf "%s",ho[i] (i==c?ORS:OFS)}
{delete v;
for(i=1;i<NF;i+=2) v[$i]=$(i+1);
for(i=1;i<=c;i++) printf "%s", v[ho[i]] (i==c?ORS:OFS)}' file{,} |
column -ts:
Event_date Error_code Event_type Duration Missing_provider_id
20190512044638 5858 GPRS data 772 46009
20190512044638 780678
20190512064535 5858 GPRS data 2172 722310
no 2D arrays, but needs to scan the file twice to get all header info in order not to keep any data in memory, but to process the lines as they appear.