I have a comma delimited file which I am formatting to create 2 columns using printf. I am using awk to group the contents into similar groups so I can print them into nicely formatted columns.
The formatting works but the contents of the array wrap onto new lines instead of wrapping within the column itself.
Input file example:
1,test,test1,test1
2,test,test1,test2
2,test,test1,test2
2,test,test1,test2
2,test,test1,test2
2,test,test1,test2
2,test,test1,test2
2,test,test1,test2
2,test,test1,test2
2,test,test1,test2
2,test,test1,test2
2,test,test1,test2
2,test,test1,test2`
Command used:
awk -F"," 'NR>1 {a[$3]=a[$3] ? a[$3]", "$4" ("$2")" : $4" ("$2")"}
END {for (i in a) {print i":"a[i]}}' test.dat |
sort |
awk -F":" 'BEGIN { printf "%-15s %-10s\n", "COLUMN1","COLUMN2"; printf "%-15s %-10s\n", "-----------","----------"}
{ printf "%-15s %-10s\n", $1,$2}'
I am also aware about and have tried using column -t -s","
and pr
The outcome is like (simulating example):
COLUMN1 COLUMN2
======== =======
1 test1
2 test2, test2, test2, test2, test2, test2,test2, test2, test2,test2, test2, test2, test2, test2
How can I wrap the second column (even the first one if it is too long) so that it fits within its frame?
COLUMN1 COLUMN2
======== =======
1 test1
2 test2, test2, test2, test2, test2, test2,test2, test2,
test2,test2, test2, test2, test2, test2
Let's pretend this is what your original script is doing given your posted sample input and the output you say you get:
$ cat tst.awk
BEGIN { FS=","; OFS="\t" }
{ vals[$1] = ($1 in vals ? vals[$1] ", " : "") $4 }
END {
print "column1", "column2"
print "=======", "======="
for (key in vals) {
print key, vals[key]
}
}
$ awk -f tst.awk file
column1 column2
======= =======
1 test1
2 test2, test2, test2, test2, test2, test2, test2, test2, test2, test2, test2, test2
Would that be a good starting point for your question and now you want to wrap each column? If so then I'd take advantage of an existing UNIX tool like fold
or fmt
to do the wrapping for you so you don't have to write your own code to handle splitting on spaces vs mid-word, etc.:
$ cat tst.awk
BEGIN { FS=","; OFS="\t" }
{ vals[$1] = ($1 in vals ? vals[$1] ", " : "") $4 }
END {
print "column1", "column2"
print "=======", "======="
for (key in vals) {
numKeyLines = wrap(key,15,keyArr)
numValLines = wrap(vals[key],50,valArr)
numLines = (numKeyLines > numValLines ? numKeyLines : numValLines)
for (lineNr=1; lineNr<=numLines; lineNr++) {
print keyArr[lineNr], valArr[lineNr]
}
}
}
function wrap(inStr,wid,outArr, cmd,line,numLines) {
if ( length(inStr) > wid ) {
cmd = "printf \047%s\n\047 \"" inStr "\" | fold -s -w " wid+0
while ( (cmd | getline line) > 0 ) {
outArr[++numLines] = line
}
close(cmd)
}
else {
outArr[++numLines] = inStr
}
return numLines+0
}
.
$ awk -f tst.awk file
column1 column2
======= =======
1 test1
2 test2, test2, test2, test2, test2, test2, test2,
test2, test2, test2, test2, test2
If you have a lot of fields that need to be wrapped then it won't be fast due to spawning a subshell for each call to fold
so here's an all awk version that splits at spaces when possible, test it for edge cases and massage to suit:
$ cat tst.awk
BEGIN { FS=","; OFS="\t" }
{ vals[$1] = ($1 in vals ? vals[$1] ", " : "") $4 }
END {
print "column1", "column2"
print "=======", "======="
for (key in vals) {
numKeyLines = wrap(key,15,keyArr)
numValLines = wrap(vals[key],50,valArr)
numLines = (numKeyLines > numValLines ? numKeyLines : numValLines)
for (lineNr=1; lineNr<=numLines; lineNr++) {
print keyArr[lineNr], valArr[lineNr]
}
}
}
function wrap(inStr,wid,outArr, lineEnd,numLines) {
while ( length(inStr) > wid ) {
lineEnd = ( match(substr(inStr,1,wid),/.*[[:space:]]/) ? RLENGTH - 1 : wid )
outArr[++numLines] = substr(inStr,1,lineEnd)
inStr = substr(inStr,lineEnd+1)
sub(/^[[:space:]]+/,"",inStr)
}
outArr[++numLines] = inStr
return numLines
}
$ awk -f tst.awk file
column1 column2
======= =======
1 test1
2 test2, test2, test2, test2, test2, test2, test2,
test2, test2, test2, test2, test2
Here's a version that uses perl instead of awk:
#!/usr/bin/env perl
use warnings;
use strict;
my ($col1, $col4, @col4data);
print <<EOF;
COLUMN1 COLUMN2
======= =======
EOF
{
my $line = <>;
chomp $line;
($col1, $col4data[0]) = (split /,/, $line)[0,3];
}
while (<>) {
chomp;
my ($c, $a) = (split /,/)[0,3];
if ($c ne $col1) {
$col4 = join ", ", @col4data;
write;
@col4data = ();
$col1 = $c;
}
push @col4data, $a;
}
$col4 = join ", ", @col4data;
write;
format STDOUT =
@<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<
$col1, $col4
~~ ^<<<<<<<<<<<<<<<<<<<<<<
$col4
.
Example:
$ perl columns.pl input.csv
COLUMN1 COLUMN2
======= =======
1 test1
2 test2, test2, test2,
test2, test2, test2,
test2, test2, test2,
test2, test2, test2
The magic here is doing the line wrapping using an output format's fill mode. Adjust the width as needed by adding more <
's to the obvious parts in the format
description.