I have a huge tab-separated file formatted like this
X column1 column2 column3
row1 0 1 2
row2 3 4 5
row3 6 7 8
row4 9 10 11
I would like to transpose it in an efficient way using only bash commands (I could write a ten or so lines Perl script to do that, but it should be slower to execute than the native bash functions). So the output should look like
X row1 row2 row3 row4
column1 0 3 6 9
column2 1 4 7 10
column3 2 5 8 11
I thought of a solution like this
cols=`head -n 1 input | wc -w`
for (( i=1; i <= $cols; i++))
do cut -f $i input | tr $'\n' $'\t' | sed -e "s/\t$/\n/g" >> output
done
But it's slow and doesn't seem the most efficient solution. I've seen a solution for vi in this post, but it's still over-slow. Any thoughts/suggestions/brilliant ideas? :-)
Another option is to use
rs
:-c
changes the input column separator,-C
changes the output column separator, and-T
transposes rows and columns. Do not use-t
instead of-T
, because it uses an automatically calculated number of rows and columns that is not usually correct.rs
, which is named after the reshape function in APL, comes with BSDs and OS X, but it should be available from package managers on other platforms.A second option is to use Ruby:
A third option is to use
jq
:jq -R .
prints each input line as a JSON string literal,-s
(--slurp
) creates an array for the input lines after parsing each line as JSON, and-r
(--raw-output
) outputs the contents of strings instead of JSON string literals. The/
operator is overloaded to split strings.Some *nix standard util one-liners, no temp files needed. NB: the OP wanted an efficient fix, (i.e. faster), and the top answers are usually faster than this answer. These one-liners are for those who like *nix software tools, for whatever reasons. In rare cases, (e.g. scarce IO & memory), these snippets can actually be faster than some of the top answers.
Call the input file foo.
If we know foo has four columns:
If we don't know how many columns foo has:
xargs
has a size limit and therefore would make incomplete work with a long file. What size limit is system dependent, e.g.:tr
&echo
:...or if the # of columns are unknown:
Using
set
, which likexargs
, has similar command line size based limitations:A Python solution:
The above is based on the following:
This code does assume that every line has the same number of columns (no padding is performed).
Here's a Haskell solution. When compiled with -O2, it runs slightly faster than ghostdog's awk and slightly slower than Stephan's
thinly wrapped cpython on my machine for repeated "Hello world" input lines. Unfortunately GHC's support for passing command line code is non-existent as far as I can tell, so you will have to write it to a file yourself. It will truncate the rows to the length of the shortest row.The only improvement I can see to your own example is using awk which will reduce the number of processes that are run and the amount of data that is piped between them:
Pure BASH, no additional process. A nice exercise: