An efficient way to transpose a file in Bash

2018-12-31 07:57发布

I have a huge tab-separated file formatted like this

X column1 column2 column3
row1 0 1 2
row2 3 4 5
row3 6 7 8
row4 9 10 11

I would like to transpose it in an efficient way using only bash commands (I could write a ten or so lines Perl script to do that, but it should be slower to execute than the native bash functions). So the output should look like

X row1 row2 row3 row4
column1 0 3 6 9
column2 1 4 7 10
column3 2 5 8 11

I thought of a solution like this

cols=`head -n 1 input | wc -w`
for (( i=1; i <= $cols; i++))
do cut -f $i input | tr $'\n' $'\t' | sed -e "s/\t$/\n/g" >> output
done

But it's slow and doesn't seem the most efficient solution. I've seen a solution for vi in this post, but it's still over-slow. Any thoughts/suggestions/brilliant ideas? :-)

25条回答
琉璃瓶的回忆
2楼-- · 2018-12-31 08:45
#!/bin/bash

aline="$(head -n 1 file.txt)"
set -- $aline
colNum=$#

#set -x
while read line; do
  set -- $line
  for i in $(seq $colNum); do
    eval col$i="\"\$col$i \$$i\""
  done
done < file.txt

for i in $(seq $colNum); do
  eval echo \${col$i}
done

another version with set eval

查看更多
若你有天会懂
3楼-- · 2018-12-31 08:46

Not very elegant, but this "single-line" command solves the problem quickly:

cols=4; for((i=1;i<=$cols;i++)); do \
            awk '{print $'$i'}' input | tr '\n' ' '; echo; \
        done

Here cols is the number of columns, where you can replace 4 by head -n 1 input | wc -w.

查看更多
妖精总统
4楼-- · 2018-12-31 08:46

An awk solution that store the whole array in memory

    awk '$0!~/^$/{    i++;
                  split($0,arr,FS);
                  for (j in arr) {
                      out[i,j]=arr[j];
                      if (maxr<j){ maxr=j}     # max number of output rows.
                  }
            }
    END {
        maxc=i                 # max number of output columns.
        for     (j=1; j<=maxr; j++) {
            for (i=1; i<=maxc; i++) {
                printf( "%s:", out[i,j])
            }
            printf( "%s\n","" )
        }
    }' infile

But we may "walk" the file as many times as output rows are needed:

#!/bin/bash
maxf="$(awk '{if (mf<NF); mf=NF}; END{print mf}' infile)"
rowcount=maxf
for (( i=1; i<=rowcount; i++ )); do
    awk -v i="$i" -F " " '{printf("%s\t ", $i)}' infile
    echo
done

Which (for a low count of output rows is faster than the previous code).

查看更多
步步皆殇っ
5楼-- · 2018-12-31 08:47

Here is a Bash one-liner that is based on simply converting each line to a column and paste-ing them together:

echo '' > tmp1;  \
cat m.txt | while read l ; \
            do    paste tmp1 <(echo $l | tr -s ' ' \\n) > tmp2; \
                  cp tmp2 tmp1; \
            done; \
cat tmp1

m.txt:

0 1 2
4 5 6
7 8 9
10 11 12
  1. creates tmp1 file so it's not empty.

  2. reads each line and transforms it into a column using tr

  3. pastes the new column to the tmp1 file

  4. copies result back into tmp1.

PS: I really wanted to use io-descriptors but couldn't get them to work.

查看更多
君临天下
6楼-- · 2018-12-31 08:48

I was just looking for similar bash tranpose but with support for padding. Here is the script I wrote based on fgm's solution, that seem to work. If it can be of help...

#!/bin/bash 
declare -a array=( )                      # we build a 1-D-array
declare -a ncols=( )                      # we build a 1-D-array containing number of elements of each row

SEPARATOR="\t";
PADDING="";
MAXROWS=0;
index=0
indexCol=0
while read -a line; do
    ncols[$indexCol]=${#line[@]};
((indexCol++))
if [ ${#line[@]} -gt ${MAXROWS} ]
    then
         MAXROWS=${#line[@]}
    fi    
    for (( COUNTER=0; COUNTER<${#line[@]}; COUNTER++ )); do
        array[$index]=${line[$COUNTER]}
        ((index++))

    done
done < "$1"

for (( ROW = 0; ROW < MAXROWS; ROW++ )); do
  COUNTER=$ROW;
  for (( indexCol=0; indexCol < ${#ncols[@]}; indexCol++ )); do
if [ $ROW -ge ${ncols[indexCol]} ]
    then
      printf $PADDING
    else
  printf "%s" ${array[$COUNTER]}
fi
if [ $((indexCol+1)) -lt ${#ncols[@]} ]
then
  printf $SEPARATOR
    fi
    COUNTER=$(( COUNTER + ncols[indexCol] ))
  done
  printf "\n" 
done
查看更多
流年柔荑漫光年
7楼-- · 2018-12-31 08:49

There is a purpose built utility for this,

GNU datamash utility

apt install datamash  

datamash transpose < yourfile

Taken from this site, https://www.gnu.org/software/datamash/ and http://www.thelinuxrain.com/articles/transposing-rows-and-columns-3-methods

查看更多
登录 后发表回答