I'm using skardhamar's rga ga$getData to query GA and get all data in an unsampled manner. The data is based on more than 500k sessions per day.
At https://github.com/skardhamar/rga, paragraph 'extracting more observations than 10,000' mentions this is possible by using batch = TRUE. Also, paragraph 'Get the data unsampled' mentions that by walking over the days, you can get unsampled data. I'm trying to combine these two, but I can not get it to work. E.g.
ga$getData(xxx,
start.date = "2015-03-30",
end.date = "2015-03-31",
metrics = "ga:totalEvents",
dimensions = "ga:date,ga:customVarValue4,ga:eventCategory,ga:eventAction,ga:eventLabel",
sort = "",
filters = "",
segment = "",
,batch = TRUE, walk = TRUE
)
.. indeed gets unsampled data, but not all data. I get a dataframe with only 20k rows (10k per day). This is limiting to chunks of 10k per day, contrary to what I expect because of using the batch = TRUE setting. So for the 30th of march, I get a dataframe of 20k rows after seeing this output:
Run (1/2): for date 2015-03-30
Pulling 10000 observations in batches of 10000
Run (1/1): observations [1;10000]. Batch size: 10000
Received: 10000 observations
Received: 10000 observations
Run (2/2): for date 2015-03-31
Pulling 10000 observations in batches of 10000
Run (1/1): observations [1;10000]. Batch size: 10000
Received: 10000 observations
Received: 10000 observations
When I leave out the walk = TRUE setting, I do get all observations (771k rows, around 335k per day), but only in a sampled manner:
ga$getData(xxx,
start.date = "2015-03-30",
end.date = "2015-03-31",
metrics = "ga:totalEvents",
dimensions = "ga:date,ga:customVarValue4,ga:eventCategory,ga:eventAction,ga:eventLabel",
sort = "",
filters = "",
segment = "",
,batch = TRUE
)
Notice: Data set contains sampled data
Pulling 771501 observations in batches of 10000
Run (1/78): observations [1;10000]. Batch size: 10000
Notice: Data set contains sampled data
...
Is my data just too big to get all observations unsampled?
You could try querying by device with filters = "ga:deviceCategory==desktop" (and filters = "ga:deviceCategory!=desktop" respectively) and then merging the resulting dataframes.
I'm assuming that your users uses different devices to access your site. The underlying logic is that when you filter data, Google Analytics servers filter it before you get it, so you can "divide" your query and get unsampled data. I think is the same methododology of the "walk" function.
Desktop only
Mobile and Tablet