An efficient way to transpose a file in Bash

2018-12-31 07:57发布

I have a huge tab-separated file formatted like this

X column1 column2 column3
row1 0 1 2
row2 3 4 5
row3 6 7 8
row4 9 10 11

I would like to transpose it in an efficient way using only bash commands (I could write a ten or so lines Perl script to do that, but it should be slower to execute than the native bash functions). So the output should look like

X row1 row2 row3 row4
column1 0 3 6 9
column2 1 4 7 10
column3 2 5 8 11

I thought of a solution like this

cols=`head -n 1 input | wc -w`
for (( i=1; i <= $cols; i++))
do cut -f $i input | tr $'\n' $'\t' | sed -e "s/\t$/\n/g" >> output
done

But it's slow and doesn't seem the most efficient solution. I've seen a solution for vi in this post, but it's still over-slow. Any thoughts/suggestions/brilliant ideas? :-)

25条回答
浪荡孟婆
2楼-- · 2018-12-31 08:31

I was looking for a solution to transpose any kind of matrix (nxn or mxn) with any kind of data (numbers or data) and got the following solution:

Row2Trans=number1
Col2Trans=number2

for ((i=1; $i <= Line2Trans; i++));do
    for ((j=1; $j <=Col2Trans ; j++));do
        awk -v var1="$i" -v var2="$j" 'BEGIN { FS = "," }  ; NR==var1 {print $((var2)) }' $ARCHIVO >> Column_$i
    done
done

paste -d',' `ls -mv Column_* | sed 's/,//g'` >> $ARCHIVO
查看更多
高级女魔头
3楼-- · 2018-12-31 08:34

If you have sc installed, you can do:

psc -r < inputfile | sc -W% - > outputfile
查看更多
柔情千种
4楼-- · 2018-12-31 08:38

I normally use this little awk snippet for this requirement:

  awk '{for (i=1; i<=NF; i++) a[i,NR]=$i
        max=(max<NF?NF:max)}
        END {for (i=1; i<=max; i++)
              {for (j=1; j<=NR; j++) 
                  printf "%s%s", a[i,j], (j==NR?RS:FS)
              }
        }' file

This just loads all the data into a bidimensional array a[line,column] and then prints it back as a[column,line], so that it transposes the given input.

This needs to keep track of the maximum amount of columns the initial file has, so that it is used as the number of rows to print back.

查看更多
君临天下
5楼-- · 2018-12-31 08:40

the transpose project on sourceforge is a coreutil-like C program for exactly that.

gcc transpose.c -o transpose
./transpose -t input > output #works with stdin, too.
查看更多
刘海飞了
6楼-- · 2018-12-31 08:40

A hackish perl solution can be like this. It's nice because it doesn't load all the file in memory, prints intermediate temp files, and then uses the all-wonderful paste

#!/usr/bin/perl
use warnings;
use strict;

my $counter;
open INPUT, "<$ARGV[0]" or die ("Unable to open input file!");
while (my $line = <INPUT>) {
    chomp $line;
    my @array = split ("\t",$line);
    open OUTPUT, ">temp$." or die ("unable to open output file!");
    print OUTPUT join ("\n",@array);
    close OUTPUT;
    $counter=$.;
}
close INPUT;

# paste files together
my $execute = "paste ";
foreach (1..$counter) {
    $execute.="temp$counter ";
}
$execute.="> $ARGV[1]";
system $execute;
查看更多
笑指拈花
7楼-- · 2018-12-31 08:40

I used fgm's solution (thanks fgm!), but needed to eliminate the tab characters at the end of each row, so modified the script thus:

#!/bin/bash 
declare -a array=( )                      # we build a 1-D-array

read -a line < "$1"                       # read the headline

COLS=${#line[@]}                          # save number of columns

index=0
while read -a line; do
    for (( COUNTER=0; COUNTER<${#line[@]}; COUNTER++ )); do
        array[$index]=${line[$COUNTER]}
        ((index++))
    done
done < "$1"

for (( ROW = 0; ROW < COLS; ROW++ )); do
  for (( COUNTER = ROW; COUNTER < ${#array[@]}; COUNTER += COLS )); do
    printf "%s" ${array[$COUNTER]}
    if [ $COUNTER -lt $(( ${#array[@]} - $COLS )) ]
    then
        printf "\t"
    fi
  done
  printf "\n" 
done
查看更多
登录 后发表回答