I am trying to reshape some user data in R. I have a data.frame of session IDs. Each session has a User_ID and date. I would like to use the "User_ID" variable as my "Key" but only for the observations that have "userType" of "New Visitor". Therefore, there will be a single row for each "New Visitor". Then pass each subsequent Session ID as separate variable. For instance, if a User ID has 3 Session IDs in total, there would be a total of 6 variables:
For instance, if this is the data frame for a user:
date <- c('2015-01-01','2015-01-02','2015-01-02','2015-01-10')
userID <- c('100105276','100105276','100105276','100105276')
sessionID <- c('1452632119','1452634303','1452637067','1453600979')
userType <- c('New Visitor','Returning Visitor','Returning Visitor','Returning Visitor')
df <- cbind(date,userID,sessionID,userType)
Instead, I would like to return this:
userID sessionID1 date1 SessionID2 date2 SesionID3 date3
100105276 1452632119 2015-01-01 1452634303 2015-01-02 100105276 2015-01-02
If there are any userIDs that did not have subsequent sessionIDs, a "na" value would be passed where variables are missing values. I've read up on using tidyr or reshape2 to do this, but I haven't been able to get them to do exactly what I am looking for.
Given your data is ordered by
userID
andsessionID
, and each row is a unique session, you could do:In this output
userType
is also included as a variable, but you can always drop them afterwards.