An efficient way to transpose a file in Bash

I have a huge tab-separated file formatted like this

X column1 column2 column3
row1 0 1 2
row2 3 4 5
row3 6 7 8
row4 9 10 11

I would like to transpose it in an efficient way using only bash commands (I could write a ten or so lines Perl script to do that, but it should be slower to execute than the native bash functions). So the output should look like

X row1 row2 row3 row4
column1 0 3 6 9
column2 1 4 7 10
column3 2 5 8 11

I thought of a solution like this

cols=`head -n 1 input | wc -w`
for (( i=1; i <= $cols; i++))
do cut -f $i input | tr $'\n' $'\t' | sed -e "s/\t$/\n/g" >> output
done

But it's slow and doesn't seem the most efficient solution. I've seen a solution for vi in this post, but it's still over-slow. Any thoughts/suggestions/brilliant ideas? :-)

标签： bash parsing unix transpose

25条回答

怪性笑人.

2楼-- · 2018-12-31 08:26

Another option is to use rs:

rs -c' ' -C' ' -T

-c changes the input column separator, -C changes the output column separator, and -T transposes rows and columns. Do not use -t instead of -T, because it uses an automatically calculated number of rows and columns that is not usually correct. rs, which is named after the reshape function in APL, comes with BSDs and OS X, but it should be available from package managers on other platforms.

A second option is to use Ruby:

ruby -e'puts readlines.map(&:split).transpose.map{|x|x*" "}'

A third option is to use jq:

jq -R .|jq -sr 'map(./" ")|transpose|map(join(" "))[]'

jq -R . prints each input line as a JSON string literal, -s (--slurp) creates an array for the input lines after parsing each line as JSON, and -r (--raw-output) outputs the contents of strings instead of JSON string literals. The / operator is overloaded to split strings.

0人赞添加讨论(0) 举报

呛了眼睛熬了心

3楼-- · 2018-12-31 08:27

Some *nix standard util one-liners, no temp files needed. NB: the OP wanted an efficient fix, (i.e. faster), and the top answers are usually faster than this answer. These one-liners are for those who like *nix software tools, for whatever reasons. In rare cases, (e.g. scarce IO & memory), these snippets can actually be faster than some of the top answers.

Call the input file foo.

If we know foo has four columns:

for f in 1 2 3 4 ; do cut -d ' ' -f $f foo | xargs echo ; done

If we don't know how many columns foo has:
```
n=$(head -n 1 foo | wc -w)
for f in $(seq 1 $n) ; do cut -d ' ' -f $f foo | xargs echo ; done
```
xargs has a size limit and therefore would make incomplete work with a long file. What size limit is system dependent, e.g.:
```
{ timeout '.01' xargs --show-limits ; } 2>&1 | grep Max
```
Maximum length of command we could actually use: 2088944

tr & echo:

for f in 1 2 3 4; do cut -d ' ' -f $f foo | tr '\n\ ' ' ; echo; done

...or if the # of columns are unknown:

n=$(head -n 1 foo | wc -w)
for f in $(seq 1 $n); do 
    cut -d ' ' -f $f foo | tr '\n' ' ' ; echo
done

Using set, which like xargs, has similar command line size based limitations:
```
for f in 1 2 3 4 ; do set - $(cut -d ' ' -f $f foo) ; echo $@ ; done
```

0人赞添加讨论(0) 举报

心情的温度

4楼-- · 2018-12-31 08:28

A Python solution:

python -c "import sys; print('\n'.join(' '.join(c) for c in zip(*(l.split() for l in sys.stdin.readlines() if l.strip()))))" < input > output

The above is based on the following:

import sys

for c in zip(*(l.split() for l in sys.stdin.readlines() if l.strip())):
    print(' '.join(c))

This code does assume that every line has the same number of columns (no padding is performed).

0人赞添加讨论(0) 举报

与风俱净

5楼-- · 2018-12-31 08:29

Here's a Haskell solution. When compiled with -O2, it runs slightly faster than ghostdog's awk and slightly slower than Stephan's ~~thinly wrapped c~~ python on my machine for repeated "Hello world" input lines. Unfortunately GHC's support for passing command line code is non-existent as far as I can tell, so you will have to write it to a file yourself. It will truncate the rows to the length of the shortest row.

transpose :: [[a]] -> [[a]]
transpose = foldr (zipWith (:)) (repeat [])

main :: IO ()
main = interact $ unlines . map unwords . transpose . map words . lines

0人赞添加讨论(0) 举报

忆尘夕之涩

6楼-- · 2018-12-31 08:30

The only improvement I can see to your own example is using awk which will reduce the number of processes that are run and the amount of data that is piped between them:

/bin/rm output 2> /dev/null

cols=`head -n 1 input | wc -w` 
for (( i=1; i <= $cols; i++))
do
  awk '{printf ("%s%s", tab, $'$i'); tab="\t"} END {print ""}' input
done >> output

0人赞添加讨论(0) 举报

皆成旧梦

7楼-- · 2018-12-31 08:31

Pure BASH, no additional process. A nice exercise:

declare -a array=( )                      # we build a 1-D-array

read -a line < "$1"                       # read the headline

COLS=${#line[@]}                          # save number of columns

index=0
while read -a line ; do
    for (( COUNTER=0; COUNTER<${#line[@]}; COUNTER++ )); do
        array[$index]=${line[$COUNTER]}
        ((index++))
    done
done < "$1"

for (( ROW = 0; ROW < COLS; ROW++ )); do
  for (( COUNTER = ROW; COUNTER < ${#array[@]}; COUNTER += COLS )); do
    printf "%s\t" ${array[$COUNTER]}
  done
  printf "\n" 
done

0人赞添加讨论(0) 举报

1 2 3 4 5 下一页

An efficient way to transpose a file in Bash

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间