I have a data frame with three columns: "uuid" (that is class factor) and "created_at" (that is class POSIXct),and "trainer_item_id" (factor) and I created a third column that is named "Sessions". The column Sessions represents time sessions for each uuid ordered by time, such that the time difference between any consecutive pair of events is at most one hour (3600seconds).
I have created the column Sessions using a "for loop" and iteration. The problem is that I have more than a million of observations and it will take 8 hours to create Sessions. Is there an easier and faster way to create it than my code below? Thanks in advance for your help!
Here is a sample of the original dataset --> https://gist.github.com/einsiol/5b4e633ce69d3a8e43252f383231e4b8
Here is my code -->
library(dplyr)
# Converting the data frame trial to tibble in order to use the function group_by
trial <- tbl_df(trial); trial <- group_by(trial, uuid)
# Ordering by timestamp (created_at)
trial <- arrange(trial, created_at)
# Creating empty vector of time difference tdiff
time <- trial$created_at
tdiff <- vector(mode = "numeric",length = 0)
trial$Sessions <- vector(mode = "character",length = length(trial))
count <-1
for(i in 1:(length(trial$uuid)-1)) {
tdiff[i] <- difftime(time[i+1], time[i],units = "secs")
# If same user ID
if (trial$uuid[i+1]==trial$uuid[i]){
if (tdiff[i]<3600){
trial$Sessions[i] <- count
trial$Sessions[i+1] <- count
}else{
trial$Sessions[i] <- count
trial$Sessions[i+1] <- count
count <- count+1
}
# If different user ID
}else{
if (tdiff[i]<3600){
trial$Sessions[i] <- count
trial$Sessions[i+1] <- count
}else{
trial$Sessions[i] <- count
trial$Sessions[i+1] <- count
count <- count+1
}
count <- 1
}
}
UPDATE: I have found the answer to my question and a fast alternative to this code that you can find below!