I have a data.frame that looks like this:
which has 1000+ columns with similar names.
And I have a vector of those column names that looks like this:
The vector is sorted by the cluster_id (which goes up to 11).
I want to sort the columns in the data frame such that the columns are in the order of the names in the vector.
A simple example of what I want is that:
Data:
A B C
1 2 3
4 5 6
Vector: c("B","C","A")
Sorted:
B C A
2 3 1
5 6 4
Is there a fast way to do this?
A5C1D2H2I1M1N2O1R2T1's solution didn't work for my data (I've a similar problem that Yilun Zhang) so I found another option:
match()
find the the position of first element on the second one:I hope this can offer another solution if anyone is having problems!
UPDATE, with reproducible data added by OP:
Results in:
As OP desires.
How about:
Where
df
is the data.frame you want to sort the columns of anddf.clust
is the data frame that contains the vector with the column order (mutation_id
).This basically treats
df
as a list and uses standard vector indexing techniques to re-order it.Brodie's answer does exactly what you're asking for. However, you imply that your data are large, so I will provide an alternative using "data.table", which has a function called
setcolorder
that will change the column order by reference.Here's a reproducible example.
Start with some simple data:
Provide proof that Brodie's answer works:
Show a more memory efficient way to do the same thing.