I have a flat file (.txt) with 606,347 columns and I want to extract 50,000 RANDOM columns, with exception of the first column, which is sample identification. How can I do that using Linux commands? My file looks like:
ID SNP1 SNP2 SNP3
1 0 0 2
2 1 0 2
3 2 0 1
4 1 1 2
5 2 1 0
It is TAB delimited.
Thank you so much.
Cheers,
Paula.
@karakfa 's answer is great, but the NF value can't be obtained in the BEGIN{} part of the awk script. Refer to: How to get number of fields in AWK prior to processing
I edited the code as:
Because I am processing the single-cell gene expression profiles, so from the second row, the first column will be gene names. My output is:
Try this:
Update:
awk
to the rescue!general usage
in your special case you can print $1 and start the function loop from 2.
i.e. change
for(i=1;i<=k;i++)
toa[1]=1; for(i=2;i<=k;i++)