Plotting many natural cubic splines in ggplot (R)

2019-08-08 16:49发布

Forgive me if I'm asking too basic a question here (I'm not too experienced in R), but I'm currently trying to plot some natural cubic splines in R and I'm running up against a wall.

I have a data set which has ~3500 rows and about 30 columns. This is a data set of single-season baseball statistics for about 270 different baseball players over their entire careers. So basically, I have about 270 time series (one for each player).

I'm interested in player performance as measured by this thing called wOBA over time, so I want to fit a natural cubic spline to each and then overlay all the splines on one graph. And yes, it must be a natural cubic spline. And as far as I know, this is the only way to do it in ggplot.

My current code for doing this is:

  #initialize plot
  plot <- ggplot(data, aes(x=age, y=wOBA, color=playerID, group=playerID)) + theme(legend.position="none")

  #loop through players to add splines
  for (i in unique(data$playerID)) {
    plot <- plot + stat_smooth(method = lm, formula = y~ns(x,3), data=data[which(data$playerID=="i"),list(playerID,age,wOBA)], se=FALSE)
}

I have checked that I can run the code snippet inside the loop manually for a couple of different players, and the plot turns out exactly as I want it. But when I try to run this loop, it takes forever. I checked the memory usage as this loop was running and it definitely ran out (I am on a 4gb machine).

I'm a little confused as to why this is. I would not have expected that fitting just 270 splines would cause R to completely use up >2gb free memory at the time of execution.

I'm somewhat new to R, so I'm sure I'm missing something. Can anyone give any pointers? Sorry if this is a completely bone-headed question!

标签: r ggplot2
0条回答
登录 后发表回答