How to wrap lines within columns in Linux

2020-04-14 03:43发布

问题:

I have a comma delimited file which I am formatting to create 2 columns using printf. I am using awk to group the contents into similar groups so I can print them into nicely formatted columns.

The formatting works but the contents of the array wrap onto new lines instead of wrapping within the column itself.

Input file example:

1,test,test1,test1
2,test,test1,test2
2,test,test1,test2
2,test,test1,test2
2,test,test1,test2
2,test,test1,test2
2,test,test1,test2
2,test,test1,test2
2,test,test1,test2
2,test,test1,test2
2,test,test1,test2
2,test,test1,test2
2,test,test1,test2`

Command used:

awk -F"," 'NR>1 {a[$3]=a[$3] ? a[$3]", "$4" ("$2")" : $4" ("$2")"}
  END {for (i in a) {print i":"a[i]}}' test.dat |
sort |
awk -F":" 'BEGIN { printf "%-15s %-10s\n", "COLUMN1","COLUMN2"; printf "%-15s %-10s\n", "-----------","----------"}
  { printf "%-15s %-10s\n", $1,$2}'

I am also aware about and have tried using column -t -s"," and pr

The outcome is like (simulating example):

COLUMN1     COLUMN2
========     =======
1            test1
2            test2, test2, test2, test2, test2, test2,test2, test2, test2,test2, test2, test2, test2, test2

How can I wrap the second column (even the first one if it is too long) so that it fits within its frame?

COLUMN1     COLUMN2
========     =======
1            test1
2            test2, test2, test2, test2, test2, test2,test2, test2, 
             test2,test2, test2, test2, test2, test2

回答1:

Let's pretend this is what your original script is doing given your posted sample input and the output you say you get:

$ cat tst.awk
BEGIN { FS=","; OFS="\t" }
{ vals[$1] = ($1 in vals ? vals[$1] ", " : "") $4 }
END {
    print "column1", "column2"
    print "=======", "======="

    for (key in vals) {
        print key, vals[key]
    }
}

$ awk -f tst.awk file
column1 column2
======= =======
1       test1
2       test2, test2, test2, test2, test2, test2, test2, test2, test2, test2, test2, test2

Would that be a good starting point for your question and now you want to wrap each column? If so then I'd take advantage of an existing UNIX tool like fold or fmt to do the wrapping for you so you don't have to write your own code to handle splitting on spaces vs mid-word, etc.:

$ cat tst.awk
BEGIN { FS=","; OFS="\t" }
{ vals[$1] = ($1 in vals ? vals[$1] ", " : "") $4 }
END {
    print "column1", "column2"
    print "=======", "======="

    for (key in vals) {
        numKeyLines = wrap(key,15,keyArr)
        numValLines = wrap(vals[key],50,valArr)
        numLines = (numKeyLines > numValLines ? numKeyLines : numValLines)
        for (lineNr=1; lineNr<=numLines; lineNr++) {
            print keyArr[lineNr], valArr[lineNr]
        }
    }
}

function wrap(inStr,wid,outArr,         cmd,line,numLines) {
    if ( length(inStr) > wid ) {
        cmd = "printf \047%s\n\047 \"" inStr "\" | fold -s -w " wid+0
        while ( (cmd | getline line) > 0 ) {
            outArr[++numLines] = line
        }
        close(cmd)
    }
    else {
        outArr[++numLines] = inStr
    }
    return numLines+0
}

.

$ awk -f tst.awk file
column1 column2
======= =======
1       test1
2       test2, test2, test2, test2, test2, test2, test2,
        test2, test2, test2, test2, test2

If you have a lot of fields that need to be wrapped then it won't be fast due to spawning a subshell for each call to fold so here's an all awk version that splits at spaces when possible, test it for edge cases and massage to suit:

$ cat tst.awk
BEGIN { FS=","; OFS="\t" }
{ vals[$1] = ($1 in vals ? vals[$1] ", " : "") $4 }
END {
    print "column1", "column2"
    print "=======", "======="

    for (key in vals) {
        numKeyLines = wrap(key,15,keyArr)
        numValLines = wrap(vals[key],50,valArr)
        numLines = (numKeyLines > numValLines ? numKeyLines : numValLines)
        for (lineNr=1; lineNr<=numLines; lineNr++) {
            print keyArr[lineNr], valArr[lineNr]
        }
    }
}

function wrap(inStr,wid,outArr,         lineEnd,numLines) {
    while ( length(inStr) > wid ) {
        lineEnd = ( match(substr(inStr,1,wid),/.*[[:space:]]/) ? RLENGTH - 1 : wid )
        outArr[++numLines] = substr(inStr,1,lineEnd)
        inStr = substr(inStr,lineEnd+1)
        sub(/^[[:space:]]+/,"",inStr)
    }
    outArr[++numLines] = inStr
    return numLines
}

$ awk -f tst.awk file
column1 column2
======= =======
1       test1
2       test2, test2, test2, test2, test2, test2, test2,
        test2, test2, test2, test2, test2


回答2:

Here's a version that uses perl instead of awk:

#!/usr/bin/env perl
use warnings;
use strict;

my ($col1, $col4, @col4data);

print <<EOF;
COLUMN1     COLUMN2
=======     =======
EOF

{
  my $line = <>;
  chomp $line;
  ($col1, $col4data[0]) = (split /,/, $line)[0,3];
}

while (<>) {
  chomp;
  my ($c, $a) = (split /,/)[0,3];
  if ($c ne $col1) {
    $col4 = join ", ", @col4data;
    write;
    @col4data = ();
    $col1 = $c;
  }
  push @col4data, $a;
}

$col4 = join ", ", @col4data;
write;

format STDOUT =
@<<<<<<<    ^<<<<<<<<<<<<<<<<<<<<<<
$col1,      $col4
~~          ^<<<<<<<<<<<<<<<<<<<<<<
            $col4
.

Example:

$ perl columns.pl input.csv
COLUMN1     COLUMN2
=======     =======
1           test1
2           test2, test2, test2,
            test2, test2, test2,
            test2, test2, test2,
            test2, test2, test2

The magic here is doing the line wrapping using an output format's fill mode. Adjust the width as needed by adding more <'s to the obvious parts in the format description.



标签: linux bash awk