I have data on all the NCAA basketball games that have occurred since 2003. I am trying to implement a for loop that will calculate the average of a number of stats for each time at a point in time. Here is my for loop:
library(data.table)
roll_season_team_stats <- NULL
for (i in 0:max(stats_DT$DayNum)) {
stats <- stats_DT[DayNum < i]
roll_stats <- dcast(stats_DT, TeamID+Season~.,fun=mean,na.rm=T,value.var = c('FGM', 'FGA', 'FGM3', 'FGA3', 'FTM', 'FTA', 'OR', 'DR', 'TO'))
roll_stats$DayNum <- i + 1
roll_season_team_stats <- rbind(roll_season_team_stats, roll_stats)
}
Here is the output from dput:
structure(list(Season = c(2003L, 2003L, 2003L, 2003L, 2003L,
2003L, 2003L, 2003L, 2003L, 2003L), DayNum = c(10L, 10L, 11L,
11L, 11L, 11L, 12L, 12L, 12L, 12L), TeamID = c(1104L, 1272L,
1266L, 1296L, 1400L, 1458L, 1161L, 1186L, 1194L, 1458L), FGM = c(27L,
26L, 24L, 18L, 30L, 26L, 23L, 28L, 28L, 32L), FGA = c(58L, 62L,
58L, 38L, 61L, 57L, 55L, 62L, 58L, 67L), FGM3 = c(3L, 8L, 8L,
3L, 6L, 6L, 2L, 4L, 5L, 5L), FGA3 = c(14L, 20L, 18L, 9L, 14L,
12L, 8L, 14L, 11L, 17L), FTM = c(11L, 10L, 17L, 17L, 11L, 23L,
32L, 15L, 10L, 15L), FTA = c(18L, 19L, 29L, 31L, 13L, 27L, 39L,
21L, 18L, 19L), OR = c(14L, 15L, 17L, 6L, 17L, 12L, 13L, 13L,
9L, 14L), DR = c(24L, 28L, 26L, 19L, 22L, 24L, 18L, 35L, 22L,
22L), TO = c(23L, 13L, 10L, 12L, 14L, 9L, 17L, 19L, 17L, 6L)), row.names = c(NA,
-10L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x102004ae0>)
The loop runs successfully but it is not producing the correct output. Rather than showing the team averages over time, it is giving me the same number (what I assume is the overall mean of each stat) for each day. Any ideas what is wrong with my loop? Thanks!
Avoid growing objects in a loop which leads to excessive copying in memory. Instead, build a list of data frames to be row binded once outside the loop.
In fact, you may be able to do this in base R with
aggregate
on data frames:Online Demo
If I understand correctly, the OP wants to compute the cumulative mean of some variables for each team and season "showing the team averages over time".
Although the OP uses the term "roll", e.g.,
roll_stats
orroll_season_team_stats
, his code suggests that he is not after a rolling mean but wants to compute cumulative means from the firstDayNum
on, e.g.:However, cumulative means can be calculated directly without creating the result piecewise in a
for
loop or bylapply()
and combining the pieces afterwards.Unfortunately, the sample dataset provided by the OP does contain rows for many different teams but no history, i.e., no data for the same team for a number of consecutive days. Therefore, I have modified the sample dataset for demonstration:
Now, as there are 2 to 3 rows for each team, the cumulative means can be calculated by:
Alternatively, the cumulative means can be appended: