I'm trying to display the source property name within the Google Analytics roll up property I have linked to bigquery. Issue is, is that when I try the below some of the metrics become hugely inflated. I'm guessing this is to do with repeated fields but not sure what to do about it. I've tried a number of workarounds like using "max" but this doesn't display every property name.
All metrics except users and visits seem to be inflated.
SELECT
date,
MAX(CASE
WHEN EXISTS( SELECT 1 FROM UNNEST(hits) hits WHERE REGEXP_CONTAINS(hits.sourcePropertyInfo.sourcePropertyTrackingId, r'82272640')) THEN 'MUG'
WHEN EXISTS (
SELECT
1
FROM
UNNEST(hits) hits
WHERE
hits.sourcePropertyInfo.sourcePropertyTrackingId = 'Social') THEN 'Social'ELSE 'Website' END) AS Property,
geoNetwork.country AS Country,
COUNT(DISTINCT CONCAT(cast(visitId AS STRING),fullVisitorId)) as visits,
sum(totals.visits) as visits2,
COUNT(DISTINCT(fullVisitorId)) AS Users,
h.sourcePropertyInfo.sourcePropertyDisplayName as display,
SUM((
SELECT
SUM(latencyTracking.pageLoadTime)
FROM
UNNEST(hits)
WHERE
page.pagePath = '/' ))/SUM((
SELECT
SUM(latencyTracking.pageLoadSample)
FROM
UNNEST(hits)
WHERE
page.pagePath = '/')) AS pageloadspeed,
SUM(totals.newVisits) AS new_,
SUM(totals.screenviews) AS PAGEVIEWS,
SUM(totals.bounces) AS BOUNCES,
sum(CASE
WHEN device.isMobile = TRUE THEN (totals.visits)
ELSE 0 END) mobilevisits,
SUM(CASE
WHEN trafficSource.medium = 'organic' THEN (totals.visits)
ELSE 0 END) organicvisits,
SUM(CASE
WHEN EXISTS( SELECT 1 FROM UNNEST(hits) hits WHERE REGEXP_CONTAINS(hits.eventInfo.eventAction,'register$|registersuccess|new registration|account signup|registro')) THEN 1
ELSE 0 END) AS NewRegistrations,
SUM(CASE
WHEN EXISTS( SELECT 1 FROM UNNEST(hits) hits WHERE REGEXP_CONTAINS(hits.eventInfo.eventAction, 'add to cart|add to bag|click to buy|ass to basket|comprar|addtobasket::')) THEN 1
ELSE 0 END) AS ClickToBuy,
SUM(totals.transactions) AS Transactions
FROM
`project.dataset.ga_sessions_*`, UNNEST(hits) as h
WHERE
1 = 1
AND PARSE_TIMESTAMP('%Y%m%d', REGEXP_EXTRACT(_table_suffix, r'.*_(.*)')) BETWEEN TIMESTAMP('2017-05-01')
AND TIMESTAMP('2017-05-01')
GROUP BY
date,
Country,
display
ORDER BY
visits DESC;
EDIT:
I've tried simply removing UNNEST(HITS) hits as h from the FROM clause, this gave me the following error:
Error: Cannot access field sourcePropertyInfo on a value with type ARRAY> at [16:14]
I've also tried using it in a subquery as follows:
(select h.sourcePropertyInfo.sourcePropertyDisplayName from unnest(hits) h) as displayname,
and get the error:
Scalar subquery produced more than one element
As you need to compute several values on the hits level maybe unnesting the field hits is the best approach. The downside is that you lose the totals field aggregation for the session level but still you can work it around.
As an example:
SELECT
date,
CASE
WHEN REGEXP_CONTAINS(h.sourcePropertyInfo.sourcePropertyTrackingId, r'82272640') THEN 'MUG'
WHEN h.sourcePropertyInfo.sourcePropertyTrackingId = 'Social' THEN 'Social'ELSE 'Website'
END AS Property,
geoNetwork.country AS Country,
COUNT(DISTINCT CONCAT(CAST(visitId AS STRING),fullVisitorId)) AS visits,
COUNT(DISTINCT(fullVisitorId)) AS Users,
h.sourcePropertyInfo.sourcePropertyDisplayName AS display,
SUM(CASE
WHEN REGEXP_CONTAINS(h.page.pagepath, r'/') THEN h.latencyTracking.pageLoadTime END) / SUM(CASE
WHEN REGEXP_CONTAINS(h.page.pagepath, r'/') THEN h.latencyTracking.pageLoadSample END) AS pageloadspeed,
COUNT(DISTINCT
CASE
WHEN totals.newVisits = 1 THEN CONCAT(CAST(visitId AS STRING),fullVisitorId) END) new_visits,
COUNT(CASE
WHEN h.type = 'PAGE' THEN h.page.pagepath END) pageviews,
SUM(CASE
WHEN (h.isentrance = TRUE AND h.isexit = TRUE) THEN 1 END) bounces,
COUNT(DISTINCT (CASE
WHEN device.isMobile = TRUE THEN CONCAT(CAST(visitId AS STRING),fullVisitorId) END)) mobilevisits,
COUNT(DISTINCT (CASE
WHEN trafficSource.medium = 'organic' THEN CONCAT(CAST(visitId AS STRING),fullVisitorId) END)) organicvisits,
SUM(CASE
WHEN REGEXP_CONTAINS(h.eventInfo.eventAction,'register$|registersuccess|new registration|account signup|registro') THEN 1 END) AS NewRegistrations,
SUM(CASE
WHEN REGEXP_CONTAINS(h.eventInfo.eventAction, 'add to cart|add to bag|click to buy|ass to basket|comprar|addtobasket::') THEN 1 END) AS ClickToBuy,
COUNT(h.transaction.transactionid) transactions
FROM
`project_id.dataset_id.ga_sessions_*`,
UNNEST(hits) AS h
WHERE
1 = 1
AND PARSE_TIMESTAMP('%Y%m%d', REGEXP_EXTRACT(_table_suffix, r'.*_(.*)')) BETWEEN TIMESTAMP('2017-05-01') AND TIMESTAMP('2017-05-01')
GROUP BY
date,
Country,
display,
Property
I ran it against our dataset and it seems to be working. Some changes I did:
- Removed the
MAX
operation for the Property and added it to the group by.
- pageviews was considered as the count of hits where hit.type = 'PAGE'. Not sure if this is the same for screenviews though.
- bounce is computed when there's an entrance and exit event.
- Total transactions is a count on transaction ids (hopefully this field is being filled in your dataset as well).
You're flattening your table in your most outer FROM statement (ie here:
FROM
project.dataset.ga_sessions_*
, UNNEST(hits) as h)
All your session level dimensions eg device.* or totals.* values like totals.transactions are already rolled up to a session level, so when you flatten the table by unnesting the hits, these totals values get written as many times as there are hits. Example:
Let's say there are 30 hits in one session and 2 transactions, since you flatten/unnest your hits, you will be left with 30 rows containing totals.transactions = 2, so when you sum over them, the result would be that there were 60 transactions in this session.
Your users and visits don't get affected by this since you count distinct them, so any dupes get eliminated.
It looks to me as if you're query will work if you just remove the , UNNEST(hits) as h if you remove or adapt this line
h.sourcePropertyInfo.sourcePropertyDisplayName as display
since you already unnest the hits within the select statement where required apart from in this particular line.
after UNNESTing an array, use MAX() instead SUMMing them up to report on session level matrices:
use:
- MAX(totals.screenviews) AS PAGEVIEWS,
- MAX(totals.bounces) AS BOUNCES,
- MAX(totals.transactions) AS Transactions
- ...
- ...
instead of:
- SUM(totals.screenviews) AS PAGEVIEWS,
- SUM(totals.bounces) AS BOUNCES,
- SUM(totals.transactions) AS Transactions
this should partially solve your issue. Let me know how it goes?
I think the reason why the calculated bounce rate is much higher in the query from William Fuks is the following
WHEN (h.isentrance = TRUE AND h.isexit = TRUE) THEN 1 END) bounces
It seems that isEntrance and isExit only happens on PAGE hits and so does not account for events. Hence the over count of bounces is due to single page views that might have had one or more interaction events happen on the page.