While working with R I encountered a strange problem: I am processing date in the follwing manner: Reading data from a database into a dataframe, filling missing values, grouping and nesting the data to a combined primary key, creating a timeseries and forecastting it for every group, ungroup and clean the data, write it back into the DB.
Somehting like this: https://cran.rstudio.com/web/packages/sweep/vignettes/SW01_Forecasting_Time_Series_Groups.html
For small data sets this works like a charm, but with lager ones (over about 100000 entries) I do get the "R Session Aborted" screen from R-Studio and the nativ R GUI just stops execution and implodes. There is no information in every log file that I've look into. I suspect that it is some kind of (leaking) memory issue.
As a work around I'm processing the data in chunks with a for-loop. But no matter how small the chunk size is, I do get the "R Session Aborted" screen, which looks a lot like leaking memory. The whole date consist of about 5 million rows.
I've looked a lot into packages like ff
, the big
-Family and matter
basically everything from https://cran.r-project.org/web/views/HighPerformanceComputing.html
but this dose not seem to work well with tibbles
and the tidyverse
way of data processing.
So, how can I improve my scrip to work with massiv amounts of data? How can I gather clues about why the R Session is Aborted?