I have a list with multiple entries, an example entry looks like:
> head(gene_sets[[1]])
patient Diagnosis Eigen_gene ENSG00000080824 ENSG00000166165 ENSG00000211459 ENSG00000198763 ENSG00000198938 ENSG00000198886
1 689_120604 AD -0.5606425 50137 38263 309298 528233 523420 730537
2 412_120503 AD 0.9454632 44536 23333 404316 730342 765963 1168123
3 706_120605 AD 0.6061834 16647 22021 409498 614314 762878 1171747
4 486_120515 AD 0.8164779 21871 9836 518046 697051 613621 1217262
5 469_120514 AD 0.5354927 33460 11651 468223 653745 608259 1115973
6 369_120502 AD -0.8363372 32168 44760 271978 436132 513194 784537
For these entries, the first three columns are always consistent and the total number of columns varies.
What I would like to do is convert this entire list into a dataframe. The information I need to retain is set_index
being the index of entry in the list, then all the colnames from beyond Eigen_gene
until the last column.
I can think of solutions using loops, however I would like a dplyr/reshape
To clarify, if we had a fake input that looked like:
> list(data.frame(patient= c(1,2,3), Diagnosis= c("AD","Control", "AD"), Eigen_gene= c(1.1, 2.3, 4.3), geneA= c(1,1,1), geneC= c(2,1,3), geneB= c(2,39,458)))
patient Diagnosis Eigen_gene geneA geneC geneB
1 1 AD 1.1 1 2 2
2 2 Control 2.3 1 1 39
3 3 AD 4.3 1 3 458
The desired output would look like this (I have only shown an example of the first list entry for input, the output shows how other entries in the list would also be formatted):
> data.frame(set_index= c(1,1,1,2,2,2,3,3), gene= c("geneA", "geneC", "geneB", "geneF", "geneE", "geneH", "geneT", "geneZ"))
set_index gene
1 1 geneA
2 1 geneC
3 1 geneB
4 2 geneF
5 2 geneE
6 2 geneH
7 3 geneT
8 3 geneZ