My data contain time variable and chosen brand variable as below. time indicates the shopping time and chosenbrand indicates the purchased brand at the time.
With this data, I would like to create rank variable as shown third column, fourth column, and so on.
The rank of brands (e.g., brand1 - brand3) should be based on past 36 hours. So, to calculate the rank for the second row, which has shoptime as "2013-09-01 08:54:00 UTC"
the rank should be based on all chosenbrand
values within 36 hours before the time. (brand1
in second row should not be in the 36 hours)
Therefore, rank_brand1, rank_brand2, rank_brand3, rank_bran4,,, are my desired variables.
If I want to create rank_brand5, rank_brand6 as well...
Is there any simple way?
In addition, if I want to do it by individual (if each customer has several purchased history), how to do that?
Data is as below,
shoptime chosenbrand rank_brand1 rank_brand2 rank_brand3, ...
2013-09-01 08:35:00 UTC brand1 NA NA NA
2013-09-01 08:54:00 UTC brand1 1 NA NA
2013-09-01 09:07:00 UTC brand2 1 2 NA
2013-09-01 09:08:00 UTC brand3 1 2 3
2013-09-01 09:11:00 UTC brand5 1 2 3
2013-09-01 09:14:00 UTC brand2 1 2 3
2013-09-01 09:26:00 UTC brand6 1 1 3
2013-09-01 09:26:00 UTC brand2 1 1 3
2013-09-01 09:29:00 UTC brand2 2 1 3
2013-09-01 09:32:00 UTC brand4 2 1 3
Here is code for data
dat <- data.frame(shoptime = c("2013-09-01 08:35:00 UTC", "2013-09-01 08:54:00 UTC", "2013-09-01 09:07:00 UTC" ,"2013-09-01 09:08:00 UTC", "2013-09-01 09:11:00 UTC", "2013-09-01 09:14:00 UTC",
"2013-09-01 09:26:00 UTC", "2013-09-01 09:26:00 UTC" ,"2013-09-01 09:29:00 UTC", "2013-09-01 09:32:00 UTC"),
chosenbrand = c("brand1", "brand1", "brand2", "brand3", "brand5", "brand2", "brand6", "brand2" , "brand2" , "brand4" ),
rank_brand1 = NA,
rank_brand2 = NA,
rank_brand3 = NA,
stringsAsFactors = FALSE)
This is a tricky one. The solution below uses non-equi joins to aggregate by 36 hours periods,
dcast()
to reshape from long to wide format, and a second join with the originaldat
. There can be an arbitrary number of brands.Explanation
returns the aggregated results per 36 hours periods:
Then, this intermediate result is reshaped from long to wide format:
The final right join with the original
dat
data frame completes the missing rows and columns (see code and result above).Data
The one possibility is to write a function (with
loop
) to the job. Considering the data provided in OP:#Replace the 0 values with NA. I dont think this right approach as one can consider those count to be 0 anyway
The resulted data.frame
new.x
will look like: