Forgive me if I'm asking too basic a question here (I'm not too experienced in R), but I'm currently trying to plot some natural cubic splines in R and I'm running up against a wall.
I have a data set which has ~3500 rows and about 30 columns. This is a data set of single-season baseball statistics for about 270 different baseball players over their entire careers. So basically, I have about 270 time series (one for each player).
I'm interested in player performance as measured by this thing called wOBA over time, so I want to fit a natural cubic spline to each and then overlay all the splines on one graph. And yes, it must be a natural cubic spline. And as far as I know, this is the only way to do it in ggplot.
My current code for doing this is:
#initialize plot
plot <- ggplot(data, aes(x=age, y=wOBA, color=playerID, group=playerID)) + theme(legend.position="none")
#loop through players to add splines
for (i in unique(data$playerID)) {
plot <- plot + stat_smooth(method = lm, formula = y~ns(x,3), data=data[which(data$playerID=="i"),list(playerID,age,wOBA)], se=FALSE)
}
I have checked that I can run the code snippet inside the loop manually for a couple of different players, and the plot turns out exactly as I want it. But when I try to run this loop, it takes forever. I checked the memory usage as this loop was running and it definitely ran out (I am on a 4gb machine).
I'm a little confused as to why this is. I would not have expected that fitting just 270 splines would cause R to completely use up >2gb free memory at the time of execution.
I'm somewhat new to R, so I'm sure I'm missing something. Can anyone give any pointers? Sorry if this is a completely bone-headed question!