可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have a dataframe in the following format:

id | name               | logs                                  
---+--------------------+-----------------------------------------
84 |          "zibaroo" |                             "C47931038" 
12 | "fabien kelyarsky" | c("C47331040", "B19412225", "B18511449")
96 |     "mitra lutsko" |              c("F19712226", "A18311450")
34 |       "PaulSandoz" |                             "A47431044" 
65 |       "BeamVision" |                             "D47531045"

As you see the column "logs" includes vectors of strings in each cell.

Is there an efficient way to convert the data frame to the long format (one observation per row) without the intermediary step of separating "logs" into several columns?

This is important because the dataset is very large and the number of logs per person seems to be arbitrary.

In other words, I need the following:

id | name               | log                                 
---+--------------------+------------
84 |          "zibaroo" | "C47931038" 
12 | "fabien kelyarsky" | "C47331040"
12 | "fabien kelyarsky" | "B19412225"
12 | "fabien kelyarsky" | "B18511449"
96 |     "mitra lutsko" | "F19712226"
96 |     "mitra lutsko" | "A18311450"
34 |       "PaulSandoz" | "A47431044" 
65 |       "BeamVision" | "D47531045"

Here is the dput of a section of the real dataframe:

structure(list(id = 148:157, name = c("avihil1", "Niarfe", "doug henderson", 
"nick tan", "madisp", "woodbusy", "kevinhcross", "cylol", "andrewarrow", 
"gstavrev"), logs = list("Z47331572", "Z47031573", c("F47531574", 
"B195945", "D186871", "S192939", "S182865", "G19539045"), c("A47231575", 
"A190933", "C181859"), "F47431576", c("B47231577", "D193936", 
"Q184862"), "Y47331579", c("A47531580", "Z195944", "B185870"), 
"N47731581", "E47231582")), .Names = c("id", "name", "logs"
), row.names = 149:158, class = "data.frame")

回答1:

This is a perfect case for tidyr:

library(tidyr)
library(dplyr)
dat %>% unnest(logs)

回答2:

Using listCol_l from splitstackshape could be a good option here as the column "logs" in the data.frame is a list

library(splitstackshape)
listCol_l(df, 'logs')

 #    id           name   logs_ul
 #1: 148        avihil1 Z47331572
 #2: 149         Niarfe Z47031573
 #3: 150 doug henderson F47531574
 #4: 150 doug henderson   B195945
 #5: 150 doug henderson   D186871
 #6: 150 doug henderson   S192939
 #7: 150 doug henderson   S182865
 #8: 150 doug henderson G19539045
 #9: 151       nick tan A47231575
#10: 151       nick tan   A190933
#11: 151       nick tan   C181859
#12: 152         madisp F47431576
#13: 153       woodbusy B47231577
#14: 153       woodbusy   D193936
#15: 153       woodbusy   Q184862
#16: 154    kevinhcross Y47331579
#17: 155          cylol A47531580
#18: 155          cylol   Z195944
#19: 155          cylol   B185870
#20: 156    andrewarrow N47731581
#21: 157       gstavrev E47231582

回答3:

Just to show another option

library(data.table)
setDT(df)[, .(logs = unlist(logs)), by = .(id, name)]

回答4:

Another data.table option

library(data.table)
dt <- data.table(df)
dt[,.(id,logs=logs[[1]]), by = name]

Unlisting columns by groups

问题:

回答1:

回答2:

回答3:

回答4:

收藏的人(0)

Unlisting columns by groups

问题:

回答1:

回答2:

回答3:

回答4:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮