I have some textual data which presents itself in the following format after being read into R:
> lst <- list('A', c("", 'aa'), 'bb', 'cc', 'B', c("", 'aa'), 'bb', 'cc', 'dd')
[[1]]
[1] "A"
[[2]]
[1] "" "aa"
[[3]]
[1] "bb"
[[4]]
[1] "cc"
[[5]]
[1] "B"
[[6]]
[1] "" "aa"
[[7]]
[1] "bb"
[[8]]
[1] "cc"
[[9]]
[1] "dd"
Is there an easy way to change the structure of this list using "" as an indicator, so that the item immediately before "" becomes a list heading?
lst2 <- list(A=c('aa', 'bb', 'cc'), B=c('aa', 'bb', 'cc', 'dd'))
$A
[1] "aa" "bb" "cc"
$B
[1] "aa" "bb" "cc" "dd"
We could unlist
the list
to a vector
('v1'), then create a grouping index based on the position of empty strings (!nzchar(v1)
), removed the first element and append FALSE at the end, get the cumulative sum so that the group starts at the position before the occurence of empty string. Then, we use that in tapply
, remove the first elements of the vector and use a group by operation to get the first element of the vector for naming the 'lst2'
v1 <- unlist(lst)
i1 <- cumsum(c(!nzchar(v1)[-1], FALSE))
lst2 <- tapply(v1, i1, FUN = function(x) x[-(1:2)])
names(lst2) <- tapply(v1, i1, FUN = head, 1)
lst2
#$A
#[1] "aa" "bb" "cc"
#$B
#[1] "aa" "bb" "cc" "dd"
Another potential option:
spl_vals <- c("")
idxs <- lapply(lst, function(x) as.numeric(any(spl_vals %in% x))) #Find location of split values
c_idxs <- cumsum(idxs) #Use cumsum to group sets of values
c_idxs[which(idxs==1) - 1] <- 0 #Assign the value before each split to be 0. This will be the name of the element
spl <- split(lst, c_idxs) #Split you list into (1) names of elements and (2) individual elements
newlist <- lapply(spl, function(x) unlist(x)[!unlist(x) %in% spl_vals]) #Remove any split values
nms <- names(newlist) #Extract names of list (this is just for shortening the next line of code)
setNames(newlist[nms[!nms %in% "0"]], newlist[["0"]]) #Set names of elements
$A
[1] "aa" "bb" "cc"
$B
[1] "aa" "bb" "cc" "dd"
Note this doesn't use an unlist
from the start so if you had multiple split values and you wanted two elements to be of different type (for example character & integer) this preserves that type. For example, with:
lst <- list('A', c("", 'aa'), 'bb', 'cc', 'B', c(0, 1), 2, 3, 4)
spl_vals <- c("",0)
You get:
$A
[1] "aa" "bb" "cc"
$B
[1] 1 2 3 4