How to convert a list consisting of vector of diff

2019-01-01 16:07发布

问题:

I have a (fairly long) list of vectors. The vectors consist of Russian words that I got by using the strsplit() function on sentences.

The following is what head() returns:

[[1]]
[1] \"модно\"     \"создавать\" \"резюме\"    \"в\"         \"виде\"     

[[2]]
[1] \"ты\"        \"начианешь\" \"работать\"  \"с\"         \"этими\"    

[[3]]
[1] \"модно\"            \"называть\"         \"блогер-рилейшенз\" \"―\"                \"начинается\"       \"задолго\"         

[[4]]
[1] \"видел\" \"по\"    \"сыну,\" \"что\"   \"он\"   

[[5]]
[1] \"четырнадцать,\" \"я\"             \"поселился\"     \"на\"            \"улице\"        

[[6]]
[1] \"широко\"     \"продолжали\" \"род.\"

Note the vectors are of different length.

What I want is to be able to read the first words from each sentence, the second word, the third, etc.

The desired result would be something like this:

    P1              P2           P3                 P4    P5           P6
[1] \"модно\"         \"создавать\"  \"резюме\"           \"в\"   \"виде\"       NA
[2] \"ты\"            \"начианешь\"  \"работать\"         \"с\"   \"этими\"      NA
[3] \"модно\"         \"называть\"   \"блогер-рилейшенз\" \"―\"   \"начинается\" \"задолго\"         
[4] \"видел\"         \"по\"         \"сыну,\"            \"что\" \"он\"         NA
[5] \"четырнадцать,\" \"я\"          \"поселился\"        \"на\"  \"улице\"      NA
[6] \"широко\"        \"продолжали\" \"род.\"             NA    NA           NA

I have tried to just use data.frame() but that didn\'t work because the rows are of different length. I also tried rbind.fill() from the plyr package, but that function can only process matrices.

I found some other questions here (that\'s where I got the plyr help from), but those were all about combining for instance two data frames of different size.

Thanks for your help.

回答1:

try this:

word.list <- list(letters[1:4], letters[1:5], letters[1:2], letters[1:6])
n.obs <- sapply(word.list, length)
seq.max <- seq_len(max(n.obs))
mat <- t(sapply(word.list, \"[\", i = seq.max))

the trick is, that,

c(1:2)[1:4]

returns the vector + two NAs



回答2:

One liner with plyr

plyr::ldply(word.list, rbind)


回答3:

You can do something like this :

## Example data
l <- list(c(\"a\",\"b\",\"c\"), c(\"a2\",\"b2\"), c(\"a3\",\"b3\",\"c3\",\"d3\"))
## Compute maximum length
max.length <- max(sapply(l, length))
## Add NA values to list elements
l <- lapply(l, function(v) { c(v, rep(NA, max.length-length(v)))})
## Rbind
do.call(rbind, l)

Which gives :

     [,1] [,2] [,3] [,4]
[1,] \"a\"  \"b\"  \"c\"  NA  
[2,] \"a2\" \"b2\" NA   NA  
[3,] \"a3\" \"b3\" \"c3\" \"d3\"


回答4:

Another option is stri_list2matrix from library(stringi)

library(stringi)
stri_list2matrix(l, byrow=TRUE)
#    [,1] [,2] [,3] [,4]
#[1,] \"a\"  \"b\"  \"c\"  NA  
#[2,] \"a2\" \"b2\" NA   NA  
#[3,] \"a3\" \"b3\" \"c3\" \"d3\"

NOTE: Data from @juba\'s post.

Or as @Valentin mentioned in the comments

sapply(l, \"length<-\", max(lengths(l)))


回答5:

you could also use rbindlist() from data.table-package.

Convert vectors to data.table or data.frame and transpose it (not sure if this reduces speed a lot) with the help of lapply(). Then bind them with rbindlist() - filling missing cells with NA:

l = list(c(\"a\",\"b\",\"c\"), c(\"a2\",\"b2\"), c(\"a3\",\"b3\",\"c3\",\"d3\"))
dt = rbindlist(lapply(l, function(x) data.table(t(x))),
     fill = TRUE)


回答6:

Another option could be to define a function like this (it\'d mimic rbind.fill) or use it directly from rowr package:

cbind.fill <- function(...){
  nm <- list(...) 
  nm <- lapply(nm, as.matrix)
  n <- max(sapply(nm, nrow)) 
  do.call(cbind, lapply(nm, function (x) 
    rbind(x, matrix(, n-nrow(x), ncol(x))))) 
}

Regards