Merging multiple rows into single row

2019-08-06 05:14发布

I've some problems with my data frame in R. My data frame looks something like this:

ID  TIME    DAY        URL_NAME      VALUE  TIME_SPEND
1    12:15  Monday      HOME         4        30
1    13:15  Tuesday     CUSTOMERS    5        21  
1    15:00  Thursday    PLANTS       8        8    
1    16:21  Friday      MANAGEMENT   1        6
....

So, I want to write the rows, containing the same "ID" into one single row. Looking something like this:

ID  TIME    DAY         URL_NAME     VALUE    TIME_SPEND  TIME1  DAY1        URL_NAME1      VALUE1  TIME_SPEND1  TIME2    DAY2        URL_NAME2      VALUE2  TIME_SPEND2  TIME3    DAY3        URL_NAME3      VALUE3  TIME_SPEND3
1    12:15  Monday      HOME         4        30          13:15  Tuesday     CUSTOMERS      5       21           15:00    Thursday    PLANTS         8       8            16:21    Friday      MANAGEMENT     1       6

My second problem is, that there are about 1.500.00 unique IDs and i would like to do this for the whole data frame.

I did not find any solution fitting to my problem. I would be happy about any solutions or links to handle my problem.

1条回答
狗以群分
2楼-- · 2019-08-06 05:35

I'd recommend using dcast from the "data.table" package, which would allow you to reshape multiple measure variables at once.

Example:

library(data.table)
as.data.table(mydf)[, dcast(.SD, ID ~ rowid(ID), value.var = names(mydf)[-1])]
#    ID TIME_1 TIME_2 TIME_3   DAY_1   DAY_2    DAY_3 URL_NAME_1 URL_NAME_2 URL_NAME_3 VALUE_1 VALUE_2
# 1:  1  12:15  13:15  15:00  Monday Tuesday Thursday       HOME  CUSTOMERS     PLANTS       4       5
# 2:  2  14:15  10:19     NA Tuesday  Monday       NA  CUSTOMERS  CUSTOMERS         NA       2       9
#    VALUE_3 TIME_SPEND_1 TIME_SPEND_2 TIME_SPEND_3
# 1:       8           30           19           40
# 2:      NA           21            8           NA

Here's the sample data used:

mydf <- data.frame(
  ID = c(1, 1, 1, 2, 2),
  TIME = c("12:15", "13:15", "15:00", "14:15", "10:19"),
  DAY = c("Monday", "Tuesday", "Thursday", "Tuesday", "Monday"),
  URL_NAME = c("HOME", "CUSTOMERS", "PLANTS", "CUSTOMERS", "CUSTOMERS"),
  VALUE = c(4, 5, 8, 2, 9),
  TIME_SPEND = c(30, 19, 40, 21, 8)
)
mydf
#   ID  TIME      DAY  URL_NAME VALUE TIME_SPEND
# 1  1 12:15   Monday      HOME     4         30
# 2  1 13:15  Tuesday CUSTOMERS     5         19
# 3  1 15:00 Thursday    PLANTS     8         40
# 4  2 14:15  Tuesday CUSTOMERS     2         21
# 5  2 10:19   Monday CUSTOMERS     9          8
查看更多
登录 后发表回答