I am trying to read in a text file like this:
exp1 sample1 2 5
exp2 sample1 2 3 5 7
exp1 sample2 1 2 6
to a dataframe with a list column like this:
tibble(exp = c("exp1", "exp2", "exp3"),
sample = c("sample1","sample1","sample2"),
listdata = list(list(2,5), list(2,3,5,7), list(1,2,6)))
# A tibble: 3 x 3
exp sample listdata
<chr> <chr> <list>
1 exp1 sample1 <list [2]>
2 exp2 sample1 <list [4]>
3 exp3 sample2 <list [3]>
The purpose is to use the metadata in the first two columns to select and operate on the lists.
I can read in the lines as lists, but don't know how to separate the metadata:
listdata <- read_lines("list_c_data.txt") %>% strsplit(., " ") %>% tibble()
Any suggestions? I may need to read in the file line by line since the number of observations could be >100000 and the length of the list in each row could be >1000
We read the file using read.table/read.csv
with fill = TRUE
, then gather
(from tidyr
) the 3rd to last column of the dataset to reshape it to 'long' format, grouped by 'V1' and 'V2', we summarise
the 'Val' as a list
and then rename
the columns if necessary.
library(dplyr)
library(tidyr)
df1 <- read.table("yourfile.txt", header=FALSE, fill = TRUE)
gather(df1, Var, Val, V3:ncol(df1), na.rm = TRUE) %>%
group_by(V1, V2) %>%
summarise(Val = list(Val)) %>%
rename(exp=V1, sample = V2, listdata = Val)
Or we can scan
to read the rows, strsplit
by space, convert the elements in 'lst' (filtering out the 1st and 2nd) to numeric
while we rbind
the 1st and 2nd elements to a data.frame
and create the 'lst2' as the third column.
l1 <- trimws(scan("yourfile.txt", what ="", sep="\n", quiet=TRUE))
lst <- strsplit(l1, " ")
lst2 <- lapply(lst, function(x) as.numeric(x[-(1:2)]))
d1 <- setNames(do.call(rbind.data.frame, lapply(lst,
function(x) x[1:2])), c("exp", "sample"))
d1$listdata <- lst2