I have a large data frame (6 million rows) with one row for entry times and next one for exit times of the same unit (id). I need to put them together.
Original data looks something like the following (please bear in mind that some "id" may entry and exit twice like in case of id=1):
df <- read.table(header=T, text='id time
1 "15/12/2014 06:30"
1 "15/12/2014 06:31"
1 "15/12/2014 06:34"
1 "15/12/2014 06:35"
2 "15/12/2014 06:36"
2 "15/12/2014 06:37"
3 "15/12/2014 06:38"
3 "15/12/2014 06:39"')
Output that I need:
id entry exit
1 15/12/2014 06:30 15/12/2014 06:31
2 15/12/2014 06:34 15/12/2014 06:35
3 15/12/2014 06:36 15/12/2014 06:37
4 15/12/2014 06:38 15/12/2014 06:39
Right now I tried a for loop which picks the id and entry time from row 1 and the exit time from time from row2, and puts them together:
for (i in 1:nrow(df)){
outputdf[i,1] <- df[i+i-1,1]
outputdf[i,2] <- df[i+i-1,2]
outputdf[i,3] <- df[i+i-1+1,2]
}
The problem is that it is very inefficient (works for 10k subsets but not for my 6million data frame). I need something that takes less than a minute at least. I have 6 million rows in the df
. Do you know any alternative faster than this loop to match rows?