I'm using skardhamar's rga ga$getData to query GA and get all data in an unsampled manner. The data is based on more than 500k sessions per day.
At https://github.com/skardhamar/rga, paragraph 'extracting more observations than 10,000' mentions this is possible by using batch = TRUE. Also, paragraph 'Get the data unsampled' mentions that by walking over the days, you can get unsampled data. I'm trying to combine these two, but I can not get it to work. E.g.
ga$getData(xxx,
start.date = "2015-03-30",
end.date = "2015-03-31",
metrics = "ga:totalEvents",
dimensions = "ga:date,ga:customVarValue4,ga:eventCategory,ga:eventAction,ga:eventLabel",
sort = "",
filters = "",
segment = "",
,batch = TRUE, walk = TRUE
)
.. indeed gets unsampled data, but not all data. I get a dataframe with only 20k rows (10k per day). This is limiting to chunks of 10k per day, contrary to what I expect because of using the batch = TRUE setting. So for the 30th of march, I get a dataframe of 20k rows after seeing this output:
Run (1/2): for date 2015-03-30
Pulling 10000 observations in batches of 10000
Run (1/1): observations [1;10000]. Batch size: 10000
Received: 10000 observations
Received: 10000 observations
Run (2/2): for date 2015-03-31
Pulling 10000 observations in batches of 10000
Run (1/1): observations [1;10000]. Batch size: 10000
Received: 10000 observations
Received: 10000 observations
When I leave out the walk = TRUE setting, I do get all observations (771k rows, around 335k per day), but only in a sampled manner:
ga$getData(xxx,
start.date = "2015-03-30",
end.date = "2015-03-31",
metrics = "ga:totalEvents",
dimensions = "ga:date,ga:customVarValue4,ga:eventCategory,ga:eventAction,ga:eventLabel",
sort = "",
filters = "",
segment = "",
,batch = TRUE
)
Notice: Data set contains sampled data
Pulling 771501 observations in batches of 10000
Run (1/78): observations [1;10000]. Batch size: 10000
Notice: Data set contains sampled data
...
Is my data just too big to get all observations unsampled?